Jump to main content.


Help


EPA Publications Web Server Guided Tour

The look, feel and functionality of this service is similar to its predecessor known as the Clarit system with added functionality. More advanced users of the Clarit based system will want to study the section on Search techniques to see how query techniques have changed under the current system.

The main contents of this guided tour are the following:

Simple Search
Search Results
View a Document
Advanced Search
Fields Search
Publication Title Index
Search Techniques

Off-line Text Search


Top of page

Simple Search

Use the Simple Search to look for specific information. This page contains a number of aids and tools to help you tailor your search statements to find exactly the information you need. Type the search statement in the box Enter your search .

image - Simple search screen

When you are finished, press Search. The EPA Publications Server will execute the search statement and show you a new page with a list of documents that match the search statement.



Top of page

Search results

The term, Search Results, refers to the set of files retrieved immediately following execution of a search. Each file contains one or more terms from the search statement. As soon as a search is complete, results appear on screen in a Results Page. The Result list is initially sorted on Hit Density , and includes the Publication Number, Document Title and the number of pages in the document. Click on any line in the results list to view the associated document image.

If you wish you can cause the list to be resorted in ascending order by clicking the list header (in bold blue at the top of the results box). Additionally, you may add any or all of the result documents to a private list for future reference which will be referred to as your "Shopping Cart". As you will see, documents can also be added to your Shopping Cart from the Document Display itself.


Top of page

View a Document

After Selecting a document from the results list, a Document Display page appears, that shows the contents of the file. You can move through the document with the mouse and the controls along the top and bottom of the page display.

Finding your place in a document is now much easier. You can use individual page links (numerics) above the control bar or the page number entry box at the bottom of the page. You can use buttons to move either forwards or backwards through the document results list or retreat to the results list and select a new document from there. You can even request that all page images be dumped to your browser one after another all on the same page.

The yellow control bar contains buttons to navigate to any desired part of a document. There are page controls that move sequentially forward and back, as well as a new feature, that lets the viewer move from hit to hit. When displaying documents found using a search, the term searched for can easily be located with the "Next page with a hit" buttons. Better still, each term found will be displayed as if marked with a yellow highlighter on the image of the page.

Document key fields and properties

Selecting the Document Information button ( ) on the document display control bar will expand the display to include information about the current publication that may occasionally be useful. The key fields represent metadata provided at scan time describing the document including: publication number, title, number of pages, orientation, source format, scan date, origin and file format. With the document properties one can see: the file name of the document, the ranking of the document, the number of pages within the document, the number of hits, the total size of the document, the date and the index in which the document is filed.


Number of Hits and Hit density give you information about the richness of the results. Number of Hits shows you the number of terms that the EPA Publications Server located in each file. Hit Density is a number that gives the relative density of hits in a file. It is a function of the number of hits and the size of the file. For example, if two files each have four hits, and Hit Density of 32 and 52 respectively, the file with the higher density score is the smaller of the two files.
 
 

An additional control bar at the bottom of each display minimizes the need to scroll the display when navigating within a document. In addtion to the extra control bar, each page bottom also has a component that allows the entry of a specific page number to jump to within the current document.



Top of page

Advanced Search

For those who may require more specific search control the Advanced search page provides the means to change system parameters and a gateway to other specifically targeted input pages.

With the advanced search page the way the results and documents are presented and displayed can be changed. A selected document will be opened showing the first hit or the first page of the document. The displayed number of results shown in one result list can be changed (10, 25, 50 or 100 results).

The sorting method can be changed in ascending or descending order and one can choose to set these to hit density, number of hits, file size, file date, comment, file name or file path. With the document display options it is possible to choose what to display and in which quality. (Displaying the images, original formats (with highlights and document properties) and the image quality)

The Clear Settings button restores all settings to their default values.



Top of page

Fields Search

Fields are physically defined areas of text files. You can limit a search to such an area. The EPA Publications Web Server retrieves a file only if it contains a search term within the area defined as the field.

To choose a Field for use in a search statement, select on of the definitions and click on it to add it to the text in the search statement text box. The Field name appears in the text box, followed by a pair of braces, with the cursor between them. This means that the program is waiting for you to type the search term to find in the part of the source file marked by the Field delimiters.

The Fields Page displays all available fields that have been defined in the chosen index. Fields can be combined using AND and OR, a combination between the fields and the full-text query. So tick the field box of the field category you want to use, fill in a field value, and press search. Fields can also be combined with an open text query using the AND and OR operators.



Top of page

Publication Title Index

Title index pages simply provide static lists of documents representing the complete inventory. Lists are ordered by publication number for the associated Office of origin.

Unless you are using a broadband connection and have a specific need for the comprehensive list it is recommended that you use the Fields search to find publications by number instead.


Top of page

Search Techniques

You can search for anything in the text of your documents, using our many search techniques.

Learning to make effective searches is the most challenging part of "finding." With the wide range of techniques, you can carry out almost any kind of search. On some occasions your quests for information will be exploratory in nature; you will want to acquire a sense of the "universe" of available information. At other times, you will make very precise searches to locate a limited amount of specific information.

To make effective searches you need to become thoroughly familiar with all search techniques. Examples demonstrate not only how to use each technique, but also how to combine them into complex search statements.

Refer to each search technique for further information:

Top of page


Content Words and Phrases>

The simplest search statement contains a single content word or character string. For example, to retrieve all information in your files about Chicago, type the search statement:

chicago

directing the EPA Publications Web Server to retrieve every source document with the word chicago.

A content phrase consists of two or more content words appearing together, that is, without intervening operators such as AND or OR. The EPA Publications Web Server treats content phrases as one entity. The search statement:

chicago cubs

retrieves only those files with cubs immediately following chicago. A phrase can contain one or more noise words, for example:

Billy the Kid

The EPA Publications Web Server ignores the noise word, the, in this phrase. If you want to search for two words that do not form a phrase, connect them with a Boolean operator, either AND or OR, for example:

cleveland OR detroit

The EPA Publications Web Server will retrieve all documents that mention one or both cities. Refer to Boolean operators for additional information.

Top of page


Wild Cards

Wild card symbols added to content words lend a great deal of flexibility to search statements. Use wild cards to search for prefix, root and suffix, and to find variations in spelling of a word. The EPA Publications Web Server uses two wild card symbols: ? and *.

Question mark ( ? ) replaces a single character, for example:

b?rn, retrieves born and barn and burn.
?andy retrieves candy and dandy and sandy.

You can use more than one question mark in a word, for example:

sh??e retrieves shore and shade.

When you use ? the program retrieves only files containing words with exactly the same number of characters. For example, a search for 6060 without a wild card would not retrieve the zip code, 60607. A search for 60607 would retrieve only that zip code. Searching for 6060? would retrieve zip codes 60600 through 60609.

Asterisk ( * ) replaces zero or more characters, for example:

*vert retrieves convert and revert.

Use care when crafting search statements with multiple character wild cards to avoid results not related to the search topic. For example, to find information about automobiles, the search statement, auto*, would retrieve auto, automobile, and automotive. It would also retrieve autobiography, autocracy and autograph. A more specific search statement would be auto OR automo*.

Top of page


Boolean Operators

A search statement with only one content word retrieves every file containing that word. When you want to use more than one term in a search statement, insert operators between terms to indicate a relationship. The EPA Publications Web Server retrieves only files that meet the conditions of that relationship. You can use OR, AND and NOT.

•   The Search Operator OR

OR instructs the program to retrieve files with at least one term from the search statement. OR enlarges the search topic; use it to look for terms that have similar meaning, or refer to similar subjects. The search statement:

car OR transportation retrieves all files with one or both terms: car or transportation. This search statement is more thorough and complete than if either word were used alone.

You can combine use of wild card characters with the OR operator in search statements containing content words with similar meaning for more complete results.

universit* - retrieves both university and universities

If the search topic is higher education in general, this is a better search statement:

college OR universit* OR higher education

Top of page


•   The Search Operator AND

The operator AND searches for files with terms found on both sides of AND in the search statement. While the operator OR broadens the search topic, AND narrows the topic. Use AND to connect terms with different meanings. Using this search statement:

new england AND north dakota - retrieved files contain at least one mention of each phrase. In this search statement:

conservation OR irrigation - retrieved files need contain only one term from the search statement, although they may contain both. AND searches for occurrences of terms on both sides of the operator; retrieved files must contain both.

Top of page


•   The Search Operator NOT

Use NOT to narrow the search topic. NOT stipulates that retrieved files must not contain the word immediately following NOT in the search statement. You can use NOT with AND or OR to form a single operator between two content words, for example:

bark AND NOT tree

You can use NOT alone when joining two content words, for example, ball not bat. In this example, NOT alone is equivalent to AND NOT.

To find all files with no mention of cars, use the search statement:

NOT cars

To locate information about cars but not used cars, use the search statement:

(cars) AND NOT used cars

Note: The order of content words in the search statement affects the result. Consider this statement:

used cars AND NOT cars

The program would retrieve no files, because every file referring to used cars also refers to cars.

Top of page


Positional Operators

Positional Operators identify either a required proximity between content words, or a content word's proximity to other document elements.

•   The WITHIN Operator: W/n

W/n limits the search to content words that appear within a defined range (n) in either direction. AND, OR and NOT retrieve files if search statement terms appear anywhere in the same text file. Within n means that n-1 words can intervene. "n" can be any integer from 1 to 16,382. Do not use a comma to punctuate the integer, as in the previous sentence.

When combining the W/n operator with other positional operators, the Within n relationship applies to adjacent components. Using the following as a search statement:

blue sky w/10 green grass w/10 clear water

in the retrieved text file, blue must be adjacent to sky; sky must be within 10 words of green; green must be adjacent to grass; grass must be within 10 words of clear; clear must be adjacent to water.

Top of page



The WITHIN operator is especially useful when searching long documents. The search statement, lincoln AND illinois, retrieves a file even if Lincoln appears on page one and Illinois on page twenty. The search statement, lincoln W/10 illinois, requires that one word be within ten words of the other. This helps ensure that search terms are contextually related.

Example of W/n

Compare the following search statements for retrieving information from a company's internal sales reports:

client AND complaint

defines a broader search topic than:

client W/10 complaint

The AND operator retrieves any file with the term client if complaint is also present. When the operator W/10 replaces AND, the program retrieves only files that mention client within ten words of complaint.

Note: The position of content words connected by W/n does not affect search results. For example, 1983 W/8 tax* defines the same search topic as tax* W/8 1983.

Top of page



A special use of W/n combines it with one of these separators: sentence (EOS), paragraph (EOP) or page (EOG) to carry out a search with this format: term1 W/n/sep term2; for example:

Minnesota W/3/EOP Maine AND fishing

This search statement would retrieve files that mention fishing, and where Minnesota appears within three paragraphs of Maine. In another example:

Supreme Court W/5/EOG civil rights

in retrieved files the phrase, Supreme Court would be within five pages of the phrase civil rights.

In this use of W/n, instead of counting individual words, the program counts lines or sentences or paragraphs or pages to meet the criterion represented by "n."

Top of page


•   The Precedes Operator: P/n

Use of P/n is similar to W/n with the added stipulation that the term preceding P/n in the search statement must also precede in any retrieved files within n range. Using this search statement:

physical education P/100 fitness

The program retrieves files meeting two conditions:

  1. physical must be adjacent to education.
  2. education must precede fitness within 100 words.

 

Top of page


•   The operator TO

Use TO to search for occurrences of a term falling between two other terms. In the following search statement:

sales TO product {results}

The program searches for occurrences of results falling between occurrences of sales and of product. This technique is similar to a proximity search, but much more powerful. It highlights only the term results in retrieved files. Sales and product are not objects of the search, except as delimiters of the range for locating the term results.

 

Top of page


Precedence and Parentheses

When you use two or more operators in a search statement, The EPA Publications Web Server must give one operator precedence over the other to resolve the meaning of the statement. It evaluates a statement in an order determined by operator precedence, but you can always override normal order of evaluation by using parentheses, which have precedence higher than any operator.

The EPA Publications Web Server observes the following operator precedence from highest to lowest. Operators at the same level in the list are of equal precedence. The program evaluates them from left to right in the search expression:

  1. NOT
  2. OR 
  3. W/n P/n
  4. AND 
  5. TO

Why Use Parentheses?

Parentheses give you explicit control over the order of evaluation in complex search statements. When you use parentheses to group terms around operators, the EPA Publications Web Server interprets contents within parentheses as one unit. The use of parentheses is identical to that of algebra. We recommend that you always use parentheses when designing complex search statements (more than two operators). This helps ensure that searches function as expected.

Top of page


Examples of Parentheses

To search for information discussing cars or synonyms for cars and also sales, use parentheses:

(cars AND sales) OR car dealer

First the EPA Publications Web Server searches for all files that contain one term within parentheses. Then from that group it selects only those files that also mention the other term.

You can use multiple sets of parentheses within one search statement:

(disk drive AND printer AND modem) OR (sales AND revenue AND profit)

The program retrieves files with all terms from at least one set of parentheses within the search statement. You can also nest parentheses, for example:

((cars AND trucks) OR trains) AND (ships OR submarines)

Note that AND is the primary operator. Only files that satisfy conditions on both sides of the statement are retrieved. If you had used OR as the primary operator, the program could retrieve files that satisfy conditions on only one side of the statement.

Top of page


Examples of Precedence Ordering

Because OR has precedence over AND, EPA Publications Web Server interprets the search statement:

chicago OR los angeles AND new york

to be the same as

(chicago OR los angeles) AND new york

and looks for files that mention either Chicago AND New York, or Los Angeles AND New York.

Parentheses can override precedence, for example:

chicago OR (los angeles AND new york)

Because parentheses have highest precedence the EPA Publications Web Server locates only files that mention either Chicago only or, both Los Angeles AND New York or, all three. It would not retrieve files that contain New York alone or Los Angeles alone.

Top of page


Number Range Operator

You can search for numbers both as "terms," that is, alphanumeric character strings and as numeric values. To locate a number as a term without regard to its value, enclose it in double quotes in the search statement, for example:

jones and "60615"

Use this search statement to retrieve letters to someone named Jones whose zip code is 60615.

If you omit the quotation marks, the program would search for the value, 60615, and all equivalent values, for example, 60615.00. When you use quotes, the search is limited to that enclosed character string.

You can use these math operators in number range searches:

  • < less than
  • < = less than or equal to
  • = equal to
  • < > not equal to
  • > greater than
  • > = greater than or equal to

The following are examples of number range search statements:

> = 65 w/10 social security

> 21 AND high school graduate

Top of page


Use number range search statements to locate a value falling between two other values in the following format:

> or > = lower value : < or < = higher value

For example, the following search statement:

>1 : <10

would locate every number in the index meeting both conditions:greater than 1 and less than 10, whether integer or decimal. Searches of this type take time to execute, because every number must be looked at. If your document collection has more than a few thousand numbers, this kind of search takes too long, and may error out due to lack of system resources.

The search statement, < > 5, is treated as identical to NOT 5.

Top of page


Quorum Operator

The quorum operator searches for a specified number of terms within a search statement from one to all in the following format:

n of {term, term, .....}

where "term" is a single character string or a phrase. With the following search statement:

3 of {history, english, social studies, geography, humanities,psychology}

you could search a collection of resumes to locate applicants prepared to teach in a certain number of fields from a range of options.

When n = 1, the program converts the expression within brackets to a series of content words joined by OR, and retrieves a text file, even if it contains only one term from the search statement, for example:

1 of {mechanical drawing, drafting, prototype design, modeling}

When n equals the number of terms within brackets, the expression is converted to a series of ANDs, and a text file is retrieved only if it contains all terms from the search statement, for example:

3 of {word proc*, desktop pub*, spreadsheet}

Top of page


Separators

The EPA Publications Web Server recognizes these separators:
EOG end of page
EOL end of line
EOP end of paragraph
EOS end of sentence

They limit a search to a physically defined range of a text file. In this sense, they are similar to proximity search statements.Separators are very useful when combined with the TO operator. For example,use the search statement:

experience TO EOP {(driver or chauffeur) and >= 3}

to locate resumes of persons with a minimum of three years' experience as a driver.

To locate a single paragraph that includes two terms, use a search similar to this:

EOP TO EOP {economic and policy}

Note:

If you want to search for any of the separators as text strings, enclose them in quotes, for example, "EOG". If you do not do this, the search results will contain every file that has the End Of Page marker, which is, of course, every file.

Top of page


Fuzzy Searches

A fuzzy search can locate all occurrences of a word, plus all other words that are "close" in spelling to the original word. You specify the degree of closeness to the original word.

Examples of Fuzzy

Think of fuzzy search in terms of how similar one word is to another. To change one word into another, you can add, delete and replace single characters. A single degree is one change of one character. For example:

To change "commuter" into "computer" requires one replacement: the second "m" with "p." One degree.

To change "computw" into "computer" requires one replacement and one addition: replace "w" with "e"and add "r." Two degrees.

To change "coinputer" into "computer" requires one replacement and one deletion: replace "i" with "m,"and delete "n." Two degrees.

The higher the degree, the greater the margin of error; the lower the degree, the less leeway is allowed in matching a search term with words in your files.

Degree of Fuzzy

Degree of Fuzzy ranges from 1 to 4 by default. We recommend that you set Degree to 2 for searching normal text. This provides for mistakes that occur in scanned text because of broken and joined characters. If you need to search for long words, set Degree to 3 or 4.

An additional constraint takes into account the length of the word you are searching for, to prevent the retrieval of too many irrelevant shorter words. This constraint limits the degree for a specific word to be the lesser of the Fuzzy Degree setting and 0.5 times the word's length. For example if you set Fuzzy Degree to 4 and the search term is six characters long, the actual Degree of Fuzzy will be 0.5 X 6 = 3 rather than 4.

Top of page


Search rules and conventions

  1. With the exception of NOT, place operators only between search terms, and never at the beginning or end of a search statement.
  2. Use NOT in conjunction with a single content word, for example: NOT car
    NOT may never appear at the end of a search statement. You may also use NOT with a phrase in parentheses, for example: NOT (new york)
  3. With the exception of NOT, two operators cannot appear in sequence in a search statement. You can use the NOT operator with AND and OR, that is, AND NOT and OR NOT.
  4. Because all operators are noise words, you cannot use them as content words in search statements. For example, the search statement, and OR or will not be accepted.
  5. The EPA Publications Web Server is not case sensitive; it regards uppercase and lowercase letters as identical. We show operators in upper case for emphasis and clarity.
  6. An operator can appear more than once in a search statement.
  7. The W/n operator must include an integer in the range 1 to 16,382, followed by a space and a content word. Omit comma in integer.

  8. Top of page



  9. You can use one term to retrieve both the hyphenated and non-hyphenated spellings of a term; for example, the search term:
    • database retrieves database and data-base, but not data base
    • data-base retrieves data-base, but not database and data base
    • data base retrieves data-base and data base, but not database.

    When a multi-syllable word begins near the end of a line, a word processor may force hyphenation. The EPA Publications Web Server can find such a word in either its hyphenated or non-hyphenated form. As a side effect of this capability, searches with duplicate words in series also find single occurrences of that word; for example, the search statement, sing sing, would find single occurrences of sing as well as the phrase, sing sing. The program recognizes words with normally appearing hyphens, for example, Winston-Salem.

  10. The EPA Publications Web Server recognizes all printable characters in the ASCII character set.
  11. The EPA Publications Web Server ignores a sentence-ending period and other trailing punctuation marks, when a space or a carriage return follows. The program recognizes periods when followed by a character, as in I.B.M. or in 292.004. It treats apostrophes as null characters, and ignores them.


  12. Top of page



Off-line Text Search

Text searches can be performed off-line using either the raw OCR (Optical Character Recognition) text file    ( Scanned OCR text ), or using the combination of Tagged Image Format (TIF) files and Microsoft® Office Document Imaging™ tools.

From the Document Display page open any on-line document as a TIF file using the ( Multipage TIF ) button.
Text Search - create TIFF

Top of page



Save the TIF file to your local drive. Later you may open the saved TIF using Microsoft® Office Document Imaging™ and perform simple text searches (Edit | Find) without an Internet connection.

Text Search - find text


Top of page


Local Navigation


Jump to main content.