searchRetrieve Operation (SRU Version 1.2 Specifications)
The searchRetrieve operation is the main operation in SRU. It allows the client to submit a search and retrieve request for matching records from the server.
This example is a search for the term "dinosaur", requesting that at most one record be returned, according to the 'dc' schema
The response to an SRU searchRetrieve request is an XML document. The table below provides a summary and description of the elements provided by the XML document. The "Type" column indicates either an XML Schema type ("xsd:") or a type defined within the schema.
All records in SRU are transferred in XML. Records are not assumed to be stored in XML. Records which are not natively XML must be first transformed into XML before being transferred. Records in the response may be expressed as a single string, or as embedded XML. If a record is transferred as embedded XML, it must be well-formed and should be validatable against the record schema.
The records parameter in the response is a sequence of record elements, each of which contains either a record or a surrogate diagnostic explaining why that particular record could not be transferred. If the record schema is unknown or the record cannot be rendered in that schema, then the server MUST return a diagnostic.
Each record element is structured into the following elements:
An example record, in the simple Dublin Core schema, packed as XML:
<dc:title>This is a Sample Record</dc:title>
In order that records which are not well formed do not break the entire message, it is possible to request that they be transferred as a single string with the <, > and & characters escaped to their entity forms. Moreover some toolkits may not be able to distinguish record XML from the XML which forms the response. However, some clients may prefer that the records be transferred as XML in order to manipulate them directly with a stylesheet which renders the records and potentially also the user interface.
This distinction is made via the recordPacking parameter in the request. If the value of the parameter is 'string', then the server should escape the record before transfering it. If the value is 'xml', then it should embed the XML directly into the response. Either way, the data is transfered within the 'recordData' field. If the server cannot comply with this packing request, then it must return a diagnostic.
SRU does not assume support of persistent result sets -- that a result set created by one request may necessarily be accessed by a client in a subsequent request. SRU does expect the server to state whether or not it supports persistent result sets, and if so the result set model described below is required.
There are applications in which result sets are critical; on the other hand there are applications in which result sets are not viable. An example of the first might be scientific investigation of a database with comparison of data sets produced at different times. An example of the latter might be a very frequently used database of web pages in which persistent result sets would be an impossible burden on the infrastructure due to the frequency of use.
Even if the server does not make result sets available for public manipulation, the following model is also important to understand in order to allow a single request to both match records and then sort them.
Processing of a query results in the selection of a set of records, represented by a result set maintained at the server; logically it is an ordered list of references to the records. Once created, a result set cannot be modified. Any operation which would somehow change a result set instead creates a new result set. Each result set is referenced via a unique identifying string, generated by the server when the result set is created.
From the client's point of view, the result set is a set of records each referenced by an ordinal number, beginning at 1. The client may request a given record from a result set according to a specific schema. For example the client may request record 1 in Dublin Core, and subsequently request record 1 in MODS. The requested schema is not a property of the result set (nor of the requested records as a member of the result set); the result set is simply the ordered list of records.
A record might be deleted or otherwise become unavailable while a result sets which references that record still exists. If a client then requests that record, the server is expected to supply a surrogate diagnostic in place of the record. For example, if the record at position 2 in a result set is deleted and then a client requests records 1 through 3, the server should supply, in order: record 1, a surrogate diagnostic for record 2, record 3.
The records in a result set are not necessarily ordered according to any specific or predictable scheme, unless it has been created with a request that contains a sort specification as part of the query. See sorting in CQL for more information regarding the specifics of sorting. If search and sort specifications are supplied on the same request then only the final sorted result set is considered to exist, even if the server internally creates a result set and then sorts it.
If the server supports result sets, it may include a resultSetId in the searchRetrieve response, along with an idle time described below. If another query is submitted then the server will again supply a result set id. If the result of the query would modify an existing result set (for example, a request to sort an existing result set), then the server must supply a new id for this new set. The server should maintain unique names for each result set created, even if the result sets no longer exist, such that clients do not mistakenly request records from the new set when meaning to refer to the previous set with the same identifier.
The server may supply an idle time along with a result set. The server is making a good-faith estimate that the result set will remain available and unchanged (both in content and order) until a timeout (a period of inactivity exceeding the idle time). The idle time is an integer representing seconds; it must be a positive integer, and should not be so small that a client cannot realistically reference the result set again. If the server does not intend that the result set be referenced, it should omit the result set identifier in the response.
August 22, 2008