The ZNG Initiative

"Z39.50 Next Generation"

Updated: July 12, 2001

A group of Z39.50 implementors has been informally discussing ways to evolve Z39.50 to a more mainstream protocol, attractive to information providers, vendors, and users. These discussions began at the December 2001 ZIG meeting and have continued since. The group met June 29-30 to define specifications for a new web service definition based on Z39.50 together with web technologies: XML, URI, SOAP (RPC), and HTTP. The specification will be called ZNG, "Z39.50 Next Generation". (Earlier, it had been using the code-name ZML: "Z39.50 over XML".)

Goals and Focus
This is a proof-of-concept initiative whose basic goal is to develop a standard search and retrieve service enabling development of value-added applications such as the scholar's portal that will integrate access to various networked resources. More specifically, the goal is to lower the barriers to implementation while preserving the existing intellectual contributions of Z39.50 that have accumulated over nearly 20 years, discarding those aspects no longer useful or meaningful.

This initiative recognizes the importance of Z39.50 (as currently defined and deployed) for business-to-business communication and focuses on getting information to the user (business-to-consumer).

The service will provide semantics for searching databases containing metadata and objects, both text and non-text. Building on Z39.50 semantics will enable the creation of gateways to existing Z39.50 systems while reducing the barriers to new information providers, to make their resources available via a standard search and retrieve service.

An important assumption is that data is modeled as structured and searchable via abstract access points. The service will not support queries based on relationships between objects (as for example in RDF data) except as those relationships are explicitly exposed in abstract access points.

Omitted and Retained Z39.50 Features
Several Z39.50 features either do not fit well in the contemporary implementation-environment or are deemed to be outdated, and will be omitted in this initiative. These include:

ZNG will define a single web service that combines features of the Z39.50 Search and Present Services. The new service is (tentatively) named Search/Retrieve. The retained features from Search and Present are those that the group intends to implement. Additional features may be added later, and other Z39.50 Services may be defined later as new web services.

Note: the term service is used in two different contexts, "web service" and "Z39.50 Service". To distinguish these, we use "Service" (uppercase S) when we mean Z39.50 Service.

Search and Present are closely related, inter-dependent Services, relying on the same model. For simplicity, they have been combined into a single request /response pair. By contrast, if Scan were to be included in this initiative (it is not currently included) it would probably be defined as a separate service; if in the future some of the Z39.50 Extended Services are to be defined, they would probably be defined as separate web services.

Part of the rationale for splitting Z39.50 Services into multiple web services is the fact that connections will not be maintained. Or (conversely), part of the reason that Z39.50 services were bundled into a single protocol was its connection-orientation, which was important 10-15 years ago, but is no longer seen to be so important.

Related to the connectionless nature of ZNG, sessions will not be maintained either. The only state aspect retained will be the result set. Each invocation of the Search/Retrieve service will be a request/response sequence, either via an http URL using HTTP GET, or an XML/SOAP/RPC message using HTTP POST. Different invocations will not be related to one-another except that a result set created by one invocation may be referenced by a subsequent invocation.

ZNG will not distinguish between a server and a database. (A search request applies to a database which implicitly corresponds to a server.) As a result of this simplification, Explain will be significantly simplified, and we hope it will therefore become more widely implemented.

The Z39.50 concept of record syntax is not meaningful in ZNG and is discarded; all ZNG records will be retrieved according to a single record syntax: XML. Multiple record schemas will be supported.

ZNG will specify string queries. The query language is (tentatively) called CQL, "Common Query Language". It will be a human-readable-string query-representation based loosely on CCL (however just the query, no commands) with access points defined. Flat access points will be defined, rather than utilizing attribute vectors as in standard Z39.50. For example, consider 'title - word' and 'title - phrase'. In ZNG these would be represented as distinct access points (rather than two attribute combinations with the same Use attribute and different qualifying attributes).

Some of the features of Z39.50 that are retained in ZNG are:

Description of Search/Retrieve
The Search/Retrieve request may be submitted either as a URL or as an XML record wrapped in SOAP. The response will always be an XML record, possibly with embedded database records. (If the response was to a SOAP request, then it will have a SOAP wrapper also.)

The information conveyed in the request will be:

The information conveyed in the response will be:

Query/Result-Set Naming and Referencing

A Search/Retrieve request includes a CQL string representing either a query that the client wants the server to execute, or a string corresponding to a query that has already been executed. (However, from the protocol's point-of-view, there is no distinction.) The server, upon receiving the request, might execute the query, or might decide that there is already a stored result set corresponding to the supplied query string (regardless of whether it was the current client or another who originated the earlier query request that caused the result set to be created). The server may decide to use retained results rather than re-execute the query, and this decision is entirely at the server's discretion and is transparent to the protocol.

Note: the CQL string parameter is to be present in every Search/Retrieve request, except in the special case for Explain, which is described below.

If the server executes the query, then in the response, the server may (but is not obligated to) include a result set name (note the difference from traditional Z39.50: the server names result sets, not the client). If the server does not intend that the result set will remain reasonably static, then it should not supply a result set name. (However, the definition of "reasonably static" is completely up to the server; if the server does supply a result set name this does not constitute a guarantee that the results will remain stable).

When the client subsequently wishes to retrieve records from the result set, it may send a Search/Retrieve request which includes either the same CQL string, or a CQL string that includes the server-supplied result set name if one was supplied. (See: "Which Records to Include in Response" below.)

Note: The CQL syntax is not yet defined but it will include the result set name. It will support both the capability to qualify a result set (e.g. "records in result set 'A' where title is 'B' ") and to specify only a result set name (e.g. "records in result set 'A'") analogous to a Z39.50 present.

Result Set Idle Times

In each Search/Retrieve response the server may include a result set idle time value indicating a projected length of time (but not a guarantee) that the result set will remain available if it is not referenced. Once the result set is referenced in a subsequent Search/Retrieve request, that response may include a new value for the result set idle time.

Which Records to Include in Response

The Search/Retrieve request may specify that records are to be included in the response. The request may include a starting point (ordinal record number) and maximum number of records (in contrast to traditional Z39.50, where the actual desired number is specified). For example, if the request includes a Result Set Start Point with value 3 and Maximum Number of Records with value 10, then the client request that records 3, 4, 5, ...., possibly up to 12, be included in the response. Both parameters must be included together (i.e. if one is supplied then the other must also be supplied). If these parameters are omitted then no records are to be included in the response (when the client 's sole intent is to learn the the number of records in the result set).

Schemas

The Search/Retrieval request may specify the format of the response - the Response Schema. ZNG will define a single response schema with hooks for extensibility features, for example to enable thin clients where the response is coupled with an appropriate style sheet.

The request will also specify the Record Schema, the schema for response records. Tentatively, this will at least include Dublin Core and Onix for database records, as well as an Explain schema (see below).

Explain

We envision that there will be more motivation to implement the ZNG version of Explain (than there was to implement the Z39.50-1995 Explain) because of the substantial simplification. Explain information will be static (not based on the conventional Z39.50 Explain concept of searching an Explain database for specific information), and will be analogous in concept to the ONE-2 Explain-Lite (though the information will be accessed via Search/Retrieve rather than Init, which is not part of ZNG). The Explain simplification also owes in large part to the ZNG simplification discarding multiple databases and record syntaxes.

Explain information will include a list of supported access points, record schemas, and response schemas.

A client may request Explain information by a Search/Retrieve request, omitting the CQL string, supplying values of 1 for starting record and maximum records, and specifying the Explain Record schema as the requested record schema.

 

URL Based Request

As noted above a Search/retrieve request is invoked either via an http URL using HTTP GET, or anXML/SOAP/RPC message using HTTP POST.

The form of the URL is still to be defined (see below) and might resemble the following:

http://xxx?query=xxx&startRecord=nnn&maximumRecords=nnn&responseSchema=xxx&recordSchema=xxx

 

Projected Deliverables

Draft specifications/definitions are planned to be available end-of-July for the following:

In August, prototypes may be implemented for the following:

Acknowledgements
The following people attended the June meeting and contributed to this report::

*Note: The Library of Congress is participating in this initiative, however, the Z39.50 Maintenance Agency is not officially supporting it. The Maintenance Agency is hosting this web site as a service to Z39.50 implementors.