The functionality of the full-text search engine within the
GCMD database has been improved with new features and more intuitive
search functions that closely resemble the behavior of commercial search engines. The following was adapted from the
Jakarta
Lucene query syntax guide.
As with most modern full-text search engines, a query is
divided into terms
and operators. There
are two types of terms: single (or multiple) terms and phrases.
Type in a single term such
as ozone and click enter to retrieve a list of relevant titles.
Type in multiple terms such
asozone TOMS and the
search engine will interpret this as ozone AND TOMS
and retrieve only those descriptions with the words ozone
and TOMS somewhere in the description.
Type in a phrase as a group
of words surrounded by double quotes such as:
and click "Enter" to retrieve a list of descriptions. Descriptions
retrieved will contain those words together somewhere in the
description. In this example, the resulting descriptions returned will
contain the phrase "sea ice".
Note: The search engine is not
case sensitive. Proper results will be returned if
you type antarctica, ANTARCTICA, or Antarctica.
Boolean Search
Boolean operators are offered through the words:
AND or
"+" - Two
or more terms or phrases must be in the description. "AND"
is the default operator.
OR- Either one or the other of the multiple terms
specified must be in the description.
NOT or
"-" - A
term or phrase specified is excluded from the search.
Note: The Boolean operators OR and NOT must be specified explicitly
and must be in CAPITAL LETTERS. If you have multiple terms
in the query without any operators or quotes,
an ANDoperator
is assumed.
Search Example:
The search engine interprets this query as 'ozone
AND TOMS AND polar AND antarctica' and will return only descriptions that contain all of those words.
AND Query Examples:
If two (or more) terms are specified in a query,
descriptions containing both words will be retrieved. An AND
Boolean operator is assumed between the terms (sea
AND winds). This is equivalent to an intersection
using sets. The symbol, &&,
can be used in place of the word AND.
OR Query Example:
If two (or more) terms in a query are separated by the Boolean
operator, OR, (Note the Boolean operator MUST be capitalized,
otherwise the search engine will assume it is a word to
be searched). The query will retrieve descriptions with
either sea or topography. As a general
rule, "OR" queries will return more hits than "AND" queries.
This is equivalent to a union using sets. The symbol ||
can be used in place of the word OR.
NOT Query Examples:
In this query, the Boolean, "NOT",
separates the two terms. This query will retrieve all descriptions
with the word, "sea", but not the word, "ice". In
other words, if the description contains both sea
and ice, the description will not be retrieved. This
is equivalent to a difference using sets. The symbol "!"can be used in place of the word " NOT".
Fielded Searching
The search engine allows you to restrict your search to
any DIF or SERF metadata field (see DIF
and SERF
user guides for the list of metadata fields). The syntax
is as follows:
DIF/dif_field_name:
query
or SERF/serf_field_name:
query
For example, if you want to restrict your search to the
DIF title field, simply specify
and only those descriptions with "AVHRR" in the title
will be returned.
and only those description with "software" in the title
will be returned. The fielded searching also allows you
to drill through subfields. For example, you can specify
the exact Science Keyword (Parameter) hierarchy or Personnel field to conduct
your search:
will return all descriptions with the phrase "carbon
dioxide" as a Variable_Level_1 keyword.
will return all descriptions with the phrase "carbon
dioxide" within the parameter field.
will return all descriptions with "Personnel" with the last
name "Smith".
Modified Queries
The search engine supports the following term modifiers for
enhanced searching options:
Wildcard searches
Fuzzy searches
Proximity searches
Range searches
Term boosting
Wildcard Searches
The search engine supports single and multiple character
wildcard searches.
To perform a single character wildcard search use the
"?" symbol.
To perform a multiple character wildcard search use the
"*"
symbol.
The single character wildcard search looks for terms that
match with a single character replaced. For example, to
search for "text" or "test"
you can use the search:
Multiple character wildcard searches look for 0 or more
characters. For example, to search for wind, winds or windy,
you can use the search:
You can also use the wildcard search in the middle of
a term.
Note: You cannot use a "*" or "?" symbol as the first
character of a search.
Fuzzy Searches
The search engine supports fuzzy searches based on the "Levenshtein
Distance", or "Edit Distance" algorithm. To do a fuzzy search
use the tilde, "~",
symbol at the end of a word Term. For example, to
search for a term similar in spelling to "roam",
use the fuzzy search: roam~
This search will identify terms like foam
and roams.
An additional (optional) parameter may be used to specify the required
similarity. With a value closer to "1", only terms with a higher similarity will be matched.
For example:
Note: The default
is 0.5.
Proximity Searches
The search engine supports finding words that are within
a specific distance away from the query term. To do a proximity
search, use the tilde, "~", symbol at the end of a phrase.
For example to search for "greenhouse"
and "carbon"
within 10 words of each other in a description use the search:
Range Searches
Range queries allow one to match descriptions where the field(s)
values are between the lower and upper bound specified by
the Range Query. Range Queries can be inclusive or exclusive
of the upper and lower bounds. Inclusive range queries are denoted by square brackets
[ ]. Exclusive
range queries are denoted by curly brackets {
}. Sorting is done lexicographically.
This query will identify all descriptions with titles between
Greenhouse and IPCC, but not including Greenhouse
and IPCC.
Boosting a Term
The search engine provides the relevance level of matching
descriptions based on the terms found. To boost a term use
the caret, "^", symbol with a boost factor (a number) at
the end of the term you are searching. The higher the boost
factor, the more relevant the term will be.
Boosting allows you to control the relevance of a description
by boosting its term. For example, if you are searching
for: greenhouse carbon and you want the term "greenhouse"
to be more relevant, you can boost it by using the ^
symbol, along with the boost factor next to the term. You
would type: greenhouse^4 carbon
This will make descriptions with the term "greenhouse"
appear more relevant. You can also boost Phrase Terms as
in the example:
Note: By default, the boost factor is 1. Although the boost
factor must be positive, it can be less than 1 (e.g. 0.2)
Grouping
The search engine supports using parentheses to group clauses
to form sub queries. This can be very useful if you want
to control the Boolean logic for a query.
To search for either "greenhouse" or "carbon"
and "emissions" use the query:
This eliminates any confusion and ensures that emissions
must exist and that either term greenhouse
or carbon may exist.
Field Grouping
The search engine supports using parentheses to group multiple
clauses to a single field.
To search for a title that contains both the word, "emissions"
and the phrase, "global warming"
use the query: