CQL: Contextual Query Language (SRU Version 1.2 Specifications)
SECTIONS: Query Syntax
| BNF | About Context Sets
| Conformance/Base Profile
ADDITIONAL LINKS: Intro
and Tutorials (coming soon) | List
of All Context Sets | CQL
Context Set
Note: in version 1.1 CQL stands for
"Common Query Language". In version 1.2 it is changed to "Contextual
Query Language".
CQL, the Contextual Query Language, is a formal language for
representing queries to information retrieval systems such as web indexes,
bibliographic catalogs and museum collection information. The design objective
is that queries be human readable and writable, and that the language
be intuitive while maintaining the expressiveness of more complex languages.
Traditionally, query languages have fallen into two camps: Powerful,
expressive languages, not easily readable nor writable by non-experts
(e.g. SQL, PQF, and XQuery);or simple and intuitive languages not powerful
enough to express complex concepts (e.g. CCL and google). CQL tries to
combine simplicity and intuitiveness of expression for simple, every day
queries, with the richness of more expressive languages to accomodate
complex concepts when necessary.
Query Syntax
- CQL Query
A CQL query consists of either a single search clause [example 1], or
multiple search clauses connected by boolean operators [example 2].
It may have a sort specification at the end, following the 'sortBy'
keyword [example 3]. In addition it may include prefix assignments which
assign short names to context set identifiers [example 4].
Examples:
- dc.title any fish
- dc.title any fish or dc.creator any sanderson
- dc.title any fish sortBy dc.date/sort.ascending
- > dc = "info:srw/context-sets/1/dc-v1.1" dc.title any fish
- Search Clause
A search clause consists of either an index, relation and a search term
[example 1], or a search term by itself [example 2]. If the clause consists
of just a term, then the index is treated as 'cql.serverChoice', and
the relation is treated as '=' [example 3]. (Treated
differently in versions 1.1 and 1.2. See note
1.)
Examples:
- dc.title any fish
- fish
- cql.serverChoice = fish
- Search Term
Search terms MAY be enclosed in double quotes [example 1], though need
not be [example 2]. Search terms MUST be enclosed in double quotes if
they contain any of the following characters: < > = / ( ) and
whitespace [example 3]. The search term may be an empty string [example
4], but must be present in a search clause. The empty search term has
no defined semantics.
Examples:
- "fish"
- fish
- "squirrels fish"
- ""
- Index Name
An index name always includes a base name [example 1] and may also include
a prefix [example 2], which determines the context set of which the
index is a part. The base name and the prefix are separated by a dot
character ('.'). If multiple '.' characters are present, then the first
should be treated as the prefix/base name delimiter. If the prefix is
not supplied, it is determined by the server.
Examples:
- title any fish
- dc.title any fish
- Relation
The relation in a search clause specifies the relationship between the
index and search term. It also always includes a base name [example
1] and may also include a prefix providing a context for the relation
[example 2]. If a relation does not have a prefix, the context set is
'cql'. If no relation is supplied in a search clause, then = is assumed,
which means that the relation is determined by the server. See note 1 regarding
version differences.
Examples:
- dc.title any fish
- dc.title cql.any fish
- Relation Modifiers
Relations may be modified by one or more relation modifiers. Relation
modifiers always include a base name, and may include a prefix for a
context set as above [example 1]. If a prefix is not supplied, the context
set is 'cql'. Relation modifiers are separated from each other and from
the relation by forward slash characters('/'). Whitespace may be present
on either side of a '/' character, but the relation plus modifiers group
may not end in a '/' [example 2]. Relation modifiers may also have a
comparison symbol and a value. The comparison symbol is any of = <
<= > >= <>. The value must obey the same rules for quoting
as search terms, above [example 3].
Examples:
- dc.title any/relevant fish
- dc.title any/ relevant /cql.string fish
- dc.title any/rel.algorithm=cori fish
- Boolean Operators
Search clauses may be linked by boolean operators. These are: and,
or, not and prox
[example 1]. Note that not is 'and-not' and must
not be used as a unary operator. Boolean operators all have the same
precedence; they are evaluated left-to-right. Parentheses may be used
to overide left-to-right evaluation [example 2].
Examples:
- dc.title any fish or dc.creator
any sanderson
- dc.title any fish or (dc.creator
any sanderson and dc.identifier = "id:1234567")
- Boolean Modifiers
Booleans may be modified by one or more boolean modifiers, separated
as per relation modifiers with '/' characters. Again, boolean modifiers
consist of a base name and may include a prefix determining the modifier's
context set [example 1]. If not supplied, then the context set is 'cql'.
As per relation modifiers, they may also have a comparison symbol and
a value [example 2].
Examples:
- dc.title any fish or/rel.combine=sum dc.creator any sanderson
- dc.title any fish prox/unit=word/distance>3 dc.title any squirrel
- Proximity Modifiers
Basic proximity modifiers are defined in the CQL
context set. Proximity units
'word', 'sentence', 'paragraph', and 'element' are defined in the CQL
context set, and may also be defined in other context sets. Within
the CQL set they are explicitly undefined. When defined in another
context set they may be assigned specific meaning.
Thus compare "prox/unit=word" with "prox/xyz.unit=word".
In the first, 'unit' is a prox modifier from the CQL set, and as
such its values are undefined, so 'word' is subject to interpretation
by the server. In the second, 'unit' is a prox modifier defined by
the xyz context set, which may assign the unit 'word' a specific
meaning.
The context set xyz may define additional units, for example, 'street':
prox/xyz.unit="street"
Note that this approach, 'prox/xyz.unit="street"', is preferable to
'Prox/unit=xyz.street'. In the first case, 'unit' is a modifier define
in the xyz context set, and 'street' is a value defined for that modifier.
In the second, 'unit' is a modifier from the cql context set, with
a value defined in a different set. so its value would have to be
one that is defined in the cql context set. Pairing a modifier from
one set with a value from another is not a good practice.
- Sorting (See
note 2 regarding version differences.)
Queries may include explicit information on how to sort the result set
generated by the search. The sort specification is included at the end,
and is separated by a 'sortBy' keyword. The specification consists of
an ordered list of indexes, potentially with modifiers, to use as keys
on which to sort the result set. If multiple keys are given, then the
second and subsequent keys should be used to determine the order of
items that would otherwise sort together. Each index used as a sort
key has the same semantics as when it is used to search.
Modifiers may be attached to the index in the same way as to booleans
and relations in the main part of the query. These modifiers may be
part of any context set, but the CQL context set and the Sort context
set are especially important. If a modifier may be used in this way
should be stated in the description of its semantics, and it is the
only time at which modifiers may be attached to indexes. As many types
of search also require specification of term order (for example the
<, > and within relations), these modifiers are often specified
as relation modifiers.
Examples:
- "cat" sortBy dc.title
- "dinosaur" sortBy dc.date/sort.descending
dc.title/sort.ascending
- Prefix Assignment
Warning: The use of Prefix Maps is very uncommon.
A Prefix Map may be used to assign context set names to specific identifiers
in order to be sure that the server maps them in a desired fashion.
It may occur at any place in the query and applies to anything below
the map in the query tree. A prefix assignment is specified by: '>'
shortname '=' identifier [example 1]. The shortname and '=' sign may
be omitted, in which case it sets a default context set for indexes
[example 2].
Examples:
- > dc = "http://deepcustard.org/" dc.custardDepth > 10
- > "http://deepcustard.org/" custardDepth > 10
- Case Insensitive
All parts of CQL are case insensitive apart from user supplied search
terms, values for modifiers and prefix map identifiers, which may or
may not be case sensitive. If any case insensitive part of CQL is specified
with both upper and lower case, it is for aesthetic purposes only.
Examples:
- dC.tiTlE any fish
- dc.TitlE Any/rEl.algOriThm=cori fish soRtbY Dc.TitlE
Notes:
- In version 1.2 the default relation is '=',
while in version 1.1, the default relation is 'scr'. In version 1.1
the '=' relation means "adjacency". In version 1.2 the "="
relation from version 1.1 is replaced by new relation 'adj'.
- In version 1.1, a sort
parameter is included in the searchRetrieve operation. That parameter
is dropped in version 1.2 and instead the sort specification becomes
part of the CQL query. Additional
description of sorting in version 1.1.
BNF
Following is the Backus Naur Form (BNF) definition for CQL. ["::=" represents
"is defined as"]
sortedQuery |
::= |
prefixAssignment sortedQuery
| scopedClause ['sortby' sortSpec] |
sortSpec |
::= |
sortSpec singleSpec | singleSpec |
singleSpec |
::= |
index [modifierList] |
Note:
The above three assignments are new in version 1.2 to accomodate the
sortSpec. |
cqlQuery |
::= |
prefixAssignment cqlQuery
| scopedClause |
prefixAssignment |
::= |
'>' prefix '=' uri
| '>' uri |
scopedClause |
::= |
scopedClause booleanGroup searchClause
| searchClause |
booleanGroup |
::= |
boolean [modifierList] |
boolean |
::= |
'and' | 'or' | 'not' | 'prox' |
searchClause |
::= |
'(' cqlQuery ')'
| index relation searchTerm
| searchTerm |
relation |
::= |
comparitor [modifierList] |
comparitor |
::= |
comparitorSymbol | namedComparitor |
comparitorSymbol |
::= |
'=' | '>' | '<' | '>=' | '<=' | '<>' | '==' |
namedComparitor |
::= |
identifier |
modifierList |
::= |
modifierList modifier | modifier |
modifier |
::= |
'/' modifierName [comparitorSymbol modifierValue] |
prefix, uri, modifierName, modifierValue,
searchTerm, index |
::= |
term |
term |
::= |
identifier | 'and' | 'or' | 'not' | 'prox' | 'sortby' |
identifier |
::= |
charString1 | charString2 |
charString1 |
:= |
Any sequence of characters that does not include any of the
following: whitespace
( (open parenthesis )
) (close parenthesis)
=
<
>
'"' (double quote)
/
If the final sequence is a reserved word, that token is returned
instead. Note that '.' (period) may be included, and a sequence of
digits is also permitted. Reserved words are 'and', 'or', 'not', and
'prox' (case insensitive). When a reserved word is used in a search
term, case is preserved. |
charString2 |
:= |
Double quotes enclosing a sequence of any characters except
double quote (unless preceded by backslash (\)). Backslash escapes
the character following it. The resultant value includes all backslash
characters except those releasing a double quote (this allows other
systems to interpret the backslash character). The surrounding double
quotes are not included. |
Context Sets
See: List of All
Context Sets | CQL
Context Set
CQL is so-named ("Contextual Query Language") because it is
founded on the concept of searching by semantics or context, rather than
by syntax. The same search may be performed in a different way on very
different underlying data structures in different servers, but the important
thing is that both servers understand the intent behind the query. In
order for multiple communities to define their own semantics, CQL uses
Context Sets in order to ensure cross-domain interoperability.
Context sets permit CQL users to create their own indexes, relations,
relation modifiers and boolean modiers without fear of chosing the same
name as someone else and thereby having an ambiguous query. All of these
four aspects of CQL must come from a context set, however there are rules
for determining the prevailing default if one is not supplied. Context
sets allow CQL to be used by communities in ways which the designers could
not have foreseen, while still maintaining the same rules for parsing
which allow interoperability.
When defining a new context set, it is necessary to provide a description
of the semantics of each item within it. While context sets may contain
indexes, relations, relation modifiers and boolean modifiers, there is
no requirement that all should be present; in fact it is expected that
most context sets will only define indexes.
Each context set has a unique identifier, a URI. When sending the context
set in a query, a short form is used. These short names may be sent as
a mapping within the query itself, or be published by the recipient of
the query in some protocol dependent fashion. The prefix 'cql' is reserved
for the base CQL context set, but authors may wish to recommend a short
name for use with their set.
An index, relation, or modifier qualified by a context is represented
in the form prefix.value, where prefix is a short
name for a unique context set identifier.
Conformance/Base Profile
In order to claim conformance to CQL a server must support one of the
following three levels:
Level 0
- Must be able to process a term-only query.
(The term is either a single word or if multiple words separated by
spaces then the entire search term is quoted). If the term includes
quote marks , they must be a escaped by preceding them with a backslash,
e.g."rai sing the \"titanic\"".)
- If an unsupported query is supplied, must be able to respond with
a diagnostic to say that the query is not supported.
Level 1
- Support for Level 0.
- Ability to parse both:
(a) search clauses consisting of 'index relation searchTerm'; and
(b) queries where search terms are combined with booleans, e.g. "term
1 AND term2"
- Support for at least one of (a) and (b).
Note that (b) does not necessarily include queries such as:
index relation term1 AND index relation term2
but rather queries where the search clauses are terms-only (do not inclu
de index or relation).
Level 2
- Support for Level 1.
- Ability to parse all of CQL and respond with appropriate
diagnostics.
Note that Level 2 does not require support for all of CQL, it
requires that the server be able to parse all of CQL (and respond
with proper diagnostics for the parts not supported.).
Note: Version 1.2 is the current
SRU and CQL version. These specifications are for both versions, 1.1
and 1.2, but are oriented to version 1.2 with version 1.1 exceptions
annotated. For a full version 1.1 specification see Version
1.1 Archive.
|