NIH LISTSERV
NIH LISTSERV
NLM Home | Contact NLM | Site Map | FAQs

Searching list archives


How does it work?
Advanced searches
Search tips
Non-English searches



How does it work?

A search can be as simple as typing a single word in the "Search for:" box and clicking on "Start the search," or it can involve the full power of LISTSERV's database functions. Here are a few examples of simple searches (the text of the example should be entered in the "Search for:" box, and none of the other boxes should be filled in):
  • To search for messages about John Kennedy, simply type John Kennedy in the search box. This will show all the messages that contain the words "John" and "Kennedy" close to each other.

  • You could also type 'John Kennedy', but this would not show messages about "John F. Kennedy".

  • For better results, you could use (John Kennedy) or JFK so that you also get the messages that say "JFK".

  • To search for words that are not necessarily close to each other, use "AND". For instance, Mozart and Beethoven would show all the messages that mention both composers, whereas Mozart Beethoven would only find a small fraction of them.

  • To make a search case sensitive, enclose it in double quotation marks. If you are interested in the works of Norman Mailer, you will probably find that searching for Mailer returns a lot of unexpected messages, whereas "Mailer" gives much better results.

  • You can get as sophisticated as you want: ((John Kennedy) or JFK) and not ((Bay Pigs) or Cuba) would look for messages about JFK that do not mention Cuba or the Bay of Pigs.

  • Some characters have special syntactical meaning to the database functions and must be enclosed in single quotes for correct results. For instance, parentheses need to be quoted in this manner: search for 'f(x)' instead of f(x).

Advanced searches

In the previous section, we discussed how to make a simple (or even complex) search using the "Search for:" box. While this is sufficient for most searches, the other search options can be used to further restrict the scope of your search and make it easier for you to find what you are looking for.

The substring search checkbox

By default, searches will only match full words: searching for planet will not find messages containing the word "planetarium" (unless they also contain the word "planet"). But if you check the "substring search" box, your search will match any word containing the string you have entered. For instance, a substring search for chem would find both "chemistry" and "alchemy."

The subject search box

To restrict your search to messages whose subject contains specific search words, simply type them in the subject search box. The syntax is the same as for the "Search for:" box, with one difference: the "AND" operator is redundant, because a subject field is very short and all the words are considered to be "close" to each other. Thus, in the subject box there is no difference between a search for Mozart and Beethoven and a search for Mozart Beethoven.

Subject searches are a good alternative when searching large archives, or when searching for topics that are mentioned quite often. If a word that you are looking for appears in the subject of a message, it is much more likely to reflect the actual contents of the message than if it only appears in one isolated sentence. On the other hand, maybe what you are looking for is hidden in a message that was about something else, and where someone just happened to mention your topic of interest in passing.

The author search box

You can also restrict your search to messages posted by a particular person. If you know the e-mail address of the person who wrote the message you are interested in, this can be a very effective way to find what you are looking for, without having to go through dozens of unrelated messages. Note that you do not need to know the exact e-mail address. For instance, if you know that the userid is "john" and the host name is some machine at XYZ.COM, you can simply enter john xyz.com in the search box. Since the author's e-mail address is a single word, there is no concept of "close" vs. "distant," and the AND operator is redundant: john xyz.com and john and xyz.com are equivalent.

Whatever you do, do not try to use wildcards (e.g. "john@*.xyz.com") as this is not the correct syntax. The author search box uses the same syntax as the subject and "Search for:" boxes.

The "since" and "until" search boxes

It is not uncommon for popular mailing lists to have archives spanning 10 or more years of activity. If the mailing list is about technology, you may not be interested in messages that are older than a few year. Or, alternatively, you may happen to know when approximately the information you are looking for was posted to the list. You can use the "Since" and "Until" boxes to restrict your search accordingly.

The syntax is very flexible and you can specify a date and/or time in just about any of the commonly used formats:

  • 23 Jun 1986 (self explanatory).
  • 1986-06-23 (international date format).
  • 1995 or just 95 selects 1 Jan 1995 for the "since" box or 31 Dec 1995 for the "until" box.
  • APR selects April of the current year, 1st or 30th depending on whether this was entered in the "since" or "until" box.
  • APRIL 95 – same as above, but for the year 1995.
  • TODAY-7 (7 days ago) makes it easy to get a list of all the messages posted in the past week. You can also use YESTERDAY or TODAY for a shorter time span.
IMPORTANT: The US date format (mm/dd or mm/dd/yy) is not supported because it is ambiguous. Many other countries use dd/mm or dd/mm/yy instead, and to avoid ambiguities LISTSERV only supports the international date format, yyyy-mm-dd or yy/mm/dd.

Search tips

Here are a few tips which may prove useful if you are not getting anywhere with your search.
  • In most cases, you will save a lot of time by using the "Since" and "Until" boxes to narrow your search to a particular date range, even if it is very approximate.
  • If you know the author of the message and have his e-mail address, use the author search box to restrict your search.
  • If you know the author's name, but not his e-mail address, add his name to the "Search for:" box. Hopefully it will be somewhere in the message header or text, and this will help narrowing the search. Make sure to clearly separate the name from the rest of the search. If you were looking for computer stores and know that the message you are looking for was written by Mary Travis, your new search should be for (computer stores) and (Mary Travis) (if you just search for computer stores Mary Travis, the four words will have to be close to each other or there will be no match).
  • Make sure to read the notes on non-English searches if you are conducting a search in a language that uses non-English characters.
  • An easy way to find a recent message is to make a search with TODAY-7 in the "Since" box, leaving all the other boxes empty. You can add the URL to your hotlist and come back to it regularly to see all the messages posted in the last week.

Non-English searches

Every effort has been made to make ISO-8859-* searches work as transparently as possible, in spite of the complexity of the situation. In order to better understand the cases where searches do not actually work as expected, you should know that the messages are archived in the format in which they were originally sent. This will typically include a mix of native 8-bit text, MIME quoted-printable text, MIME base64 text, and other proprietary encoding methods such as WINMAIL.DAT, plus of course 7-bit text. Each of these messages presents its own challenges:
  • Native 8-bit text normally produces the expected results. See below for a list of generic problems that may affect even native 8-bit text.
  • MIME quoted-printable text will, in most cases, produce the expected results. Conceptually, the search is carried out as though the =xx escape sequences had been replaced with their corresponding characters before beginning the search. However, soft line breaks (trailing '=' signs) are not processed (the lines are not merged). If the poster's mail client uses soft line breaks to split words in the middle, they will not be recognized. For instance, if the word "house" were written as "hou=" on one line followed by "se" on the next line, LISTSERV would not find a match with the search string "house".
  • MIME base64 text is not supported by the search interface. This type of encoding should only be used for binary data, because it is totally unintelligible to people without a MIME user interface and because it is context sensitive (that is, LISTSERV would have to decode the entire message before beginning the search).
  • Proprietary encoding methods such as WINMAIL.DAT are not supported by the search interface. In most cases, these formats suffer from the same kind of problems as MIME base64 text, and the mail programs that generate these messages are being replaced with MIME-capable programs.
  • 7-bit text (with national characters) does not work at all. It is impossible to translate this text to native 8-bit form without knowing the language in which it is written.

In addition, there are a number of generic problems that affect all message formats:

  • Code page: a typical international archive will contain messages in a variety of incompatible code pages (Latin-1, Icelandic, etc.) While LISTSERV knows the code page of each of the individual messages, it does not know the code page of the search string you are entering, nor does it support searches that span multiple code pages. If you search for one of the characters in the Icelandic code page, LISTSERV may incorrectly match messages written in another code page in which this character is not present, but where another character with the same binary code was found in the message.
  • Case-insensitive searches: special tables are required to properly evaluate case-insensitive searches with non-ASCII characters. The tables LISTSERV uses were designed for the Latin-1 (ISO-8859-1) code page and may not give correct results with other code pages.
  • EBCDIC systems: LISTSERV servers running on EBCDIC systems may give incorrect results due to the multiple ASCII-EBCDIC translation steps involved in processing your request. The TCP/IP product, the SMTP server, the web server and LISTSERV each have their own tables, which may or may not be identical.

NIH LISTSERV Home Page

CIT
Center for Information Technology
National Institutes of Health
Bethesda, Maryland 20892
301 594 6248 (v) 301 496 8294 (TDD)
Comments and Assistance
Accessibility wheelchair icon