How does it work? A
search can be as simple as typing a single word in the "Search for:" box and
clicking on "Start the search," or it can involve the full power of
LISTSERV's database functions. Here are a few examples of simple searches
(the text of the example should be entered in the "Search for:"
box, and none of the other boxes should be filled in): - To search
for messages about John Kennedy, simply type John Kennedy in the
search box. This will show all the messages that contain the words "John"
and "Kennedy" close to each other.
- You could also type 'John
Kennedy', but this would not show messages about "John F. Kennedy".
- For better results, you could use (John Kennedy) or JFK so that
you also get the messages that say "JFK".
- To search for words that
are not necessarily close to each other, use "AND". For instance, Mozart
and Beethoven would show all the messages that mention both composers,
whereas Mozart Beethoven would only find a small fraction of
them.
- To make a search case sensitive, enclose it in double quotation
marks. If you are interested in the works of Norman Mailer, you will
probably find that searching for Mailer returns a lot of unexpected
messages, whereas "Mailer" gives much better results.
- You
can get as sophisticated as you want: ((John Kennedy) or JFK) and not
((Bay Pigs) or Cuba) would look for messages about JFK that do not
mention Cuba or the Bay of Pigs.
- Some characters have special
syntactical meaning to the database functions and must be enclosed in single
quotes for correct results. For instance, parentheses need to be quoted in
this manner: search for 'f(x)' instead of f(x).
Advanced searches In the previous
section, we discussed how to make a simple (or even complex) search
using the "Search for:" box. While this is sufficient for most searches, the
other search options can be used to further restrict the scope of your
search and make it easier for you to find what you are looking for. The substring search checkbox By default,
searches will only match full words: searching for planet will not
find messages containing the word "planetarium" (unless they also contain
the word "planet"). But if you check the "substring search" box, your search
will match any word containing the string you have entered. For instance, a
substring search for chem would find both "chemistry" and
"alchemy." The subject search box To
restrict your search to messages whose subject contains specific search
words, simply type them in the subject search box. The syntax is the same as
for the "Search for:" box, with one difference: the
"AND" operator is redundant, because a subject field is very short and all
the words are considered to be "close" to each other. Thus, in the subject
box there is no difference between a search for Mozart and
Beethoven and a search for Mozart Beethoven. Subject
searches are a good alternative when searching large archives, or when
searching for topics that are mentioned quite often. If a word that you are
looking for appears in the subject of a message, it is much more likely to
reflect the actual contents of the message than if it only appears in one
isolated sentence. On the other hand, maybe what you are looking for is
hidden in a message that was about something else, and where someone just
happened to mention your topic of interest in passing.
The author search box You can also restrict your search to
messages posted by a particular person. If you know the e-mail address of
the person who wrote the message you are interested in, this can be a very
effective way to find what you are looking for, without having to go through
dozens of unrelated messages. Note that you do not need to know the exact
e-mail address. For instance, if you know that the userid is "john" and the
host name is some machine at XYZ.COM, you can simply enter john
xyz.com in the search box. Since the author's e-mail address is a
single word, there is no concept of "close" vs. "distant," and the
AND operator is redundant: john xyz.com and john and
xyz.com are equivalent. Whatever you do, do not try to use
wildcards (e.g. "john@*.xyz.com") as this is not the correct
syntax. The author search box uses the same syntax as the subject and "Search for:" boxes. The "since" and
"until" search boxes It is not uncommon for popular mailing lists
to have archives spanning 10 or more years of activity. If the mailing list
is about technology, you may not be interested in messages that are older
than a few year. Or, alternatively, you may happen to know when
approximately the information you are looking for was posted to the list.
You can use the "Since" and "Until" boxes to restrict your search
accordingly. The syntax is very flexible and you can specify a date
and/or time in just about any of the commonly used formats: - 23
Jun 1986 (self explanatory).
- 1986-06-23 (international
date format).
- 1995 or just 95 selects 1 Jan 1995 for
the "since" box or 31 Dec 1995 for the "until" box.
- APR selects
April of the current year, 1st or 30th depending on whether this was entered
in the "since" or "until" box.
- APRIL 95 same as above,
but for the year 1995.
- TODAY-7 (7 days ago) makes it easy to
get a list of all the messages posted in the past week. You can also use
YESTERDAY or TODAY for a shorter time span.
IMPORTANT: The US date format (mm/dd or mm/dd/yy) is not supported
because it is ambiguous. Many other countries use dd/mm or dd/mm/yy instead,
and to avoid ambiguities LISTSERV only supports the international date
format, yyyy-mm-dd or yy/mm/dd. Search tips
Here are a few tips which may prove useful if you are not getting anywhere
with your search. - In most cases, you will save a lot of time by
using the "Since" and "Until" boxes to narrow your
search to a particular date range, even if it is very approximate.
- If
you know the author of the message and have his e-mail address, use the author search box to restrict your search.
- If you
know the author's name, but not his e-mail address, add his name to the "Search for:" box. Hopefully it will be somewhere in the
message header or text, and this will help narrowing the search. Make sure
to clearly separate the name from the rest of the search. If you were
looking for computer stores and know that the message you are
looking for was written by Mary Travis, your new search should be for
(computer
stores) and (Mary Travis) (if you just search for computer stores
Mary Travis, the four words will have to be close to each other or
there will be no match).
- Make sure to read the notes on
non-English searches if you are conducting a search in a language that
uses non-English characters.
- An easy way to find a recent message is to
make a search with TODAY-7 in the "Since" box, leaving all the
other boxes empty. You can add the URL to your hotlist and come back to it
regularly to see all the messages posted in the last week.
Non-English searches Every effort has been made to
make ISO-8859-* searches work as transparently as possible, in spite of the
complexity of the situation. In order to better understand the cases where
searches do not actually work as expected, you should know that the messages
are archived in the format in which they were originally sent. This will
typically include a mix of native 8-bit text, MIME quoted-printable text,
MIME base64 text, and other proprietary encoding methods such as
WINMAIL.DAT, plus of course 7-bit text. Each of these messages presents its
own challenges: - Native 8-bit text normally produces the
expected results. See below for a list of generic problems that may affect
even native 8-bit text.
- MIME quoted-printable text will, in most
cases, produce the expected results. Conceptually, the search is carried out
as though the =xx escape sequences had been replaced with their
corresponding characters before beginning the search. However, soft line
breaks (trailing '=' signs) are not processed (the lines are not merged). If
the poster's mail client uses soft line breaks to split words in the middle,
they will not be recognized. For instance, if the word "house" were written
as "hou=" on one line followed by "se" on the next line, LISTSERV would not
find a match with the search string "house".
- MIME base64 text is
not supported by the search interface. This type of encoding should only be
used for binary data, because it is totally unintelligible to people without
a MIME user interface and because it is context sensitive (that is, LISTSERV
would have to decode the entire message before beginning the search).
- Proprietary encoding methods such as WINMAIL.DAT are not
supported by the search interface. In most cases, these formats suffer from
the same kind of problems as MIME base64 text, and the mail programs that
generate these messages are being replaced with MIME-capable programs.
- 7-bit text (with national characters) does not work at all. It is
impossible to translate this text to native 8-bit form without knowing the
language in which it is written.
In addition, there are a number of
generic problems that affect all message
formats:
- Code page: a typical international archive will contain messages
in a variety of incompatible code pages (Latin-1, Icelandic, etc.)
While LISTSERV knows the code page of each of the individual messages, it
does not know the code page of the search string you are entering, nor does
it support searches that span multiple code pages. If you search for one of
the characters in the Icelandic code page, LISTSERV may incorrectly match
messages written in another code page in which this character is not
present, but where another character with the same binary code was found in
the message.
- Case-insensitive searches: special tables are
required to properly evaluate case-insensitive searches with non-ASCII
characters. The tables LISTSERV uses were designed for the Latin-1
(ISO-8859-1) code page and may not give correct results with other code
pages.
- EBCDIC systems: LISTSERV servers running on EBCDIC systems
may give incorrect results due to the multiple ASCII-EBCDIC translation
steps involved in processing your request. The TCP/IP product, the SMTP
server, the web server and LISTSERV each have their own tables, which may or
may not be identical.
|