Related Documents:
Version 1.1 July 9, 1999
Problems with the bib-1 attribute set began to surface at that time (1992). Within the bibliographic community, implementors had no published definitions of the bib-1 attribute semantics, thus vendors implemented the bib-1 attribute set with their own interpretations of the attribute usage. A document was produced to clarify this (Bib-1 Semantics Document), although it was never formally included as part of the standard.
As the Internet grew, more communities wanted to implement Z39.50 and, in turn, needed additional attributes (beyond those already in bib-1) to reflect the types of data they wanted to exchange. This proved difficult as Z39.50-1992 did not allow a query to include attributes from more than a single attribute set. Since bib-1 was the only publicly visible set, it was expanded to accommodate the needs of these communities. Thus, bib-1 grew without plan or rigor, evolving away from the bibliographic community where it had started, and "bib-1" became somewhat of a misnomer as it grew into a global set of attributes.
In 1994 and 1995, as Z39.50 version 3 was being finalized and as Z39.50 began to be widely implemented, additional concerns arose over the relationships among attribute sets that other groups were developing, notably the STAS and GILS attribute sets. The Z39.50 Implementors Group (ZIG) had many questions about the development and implementation of multiple attribute sets, including duplication of attributes across sets. In early 1996 a discussion paper by Cliff Lynch (Defining and Maintaining Attribute Sets for Use with the Z39.50 Protocol: A Discussion Paper ) detailed the issues:
The major conclusion of the group was that a new architecture for attribute sets should be developed; they went on to recommend an architecture based on classes of attribute sets, with expanded attribute types. Another major conclusion was that expert communities, rather than the ZIG, should be responsible for developing and maintaining attribute sets (following the example set by GILS and STAS). Notably, they recommended that the bibliographic community, rather than the ZIG, develop the next generation of bibliographic attributes. The ZIG should continue to be responsible for attributes that are general to Z39.50, that is, not specific to a given community.
The type-1 query consists of one or more search terms, each with a set of attributes, specifying, for example, the type of term (author, title, subject, etc.), whether the term is truncated, its structure, etc. The server is responsible for mapping attributes to the logical design of the database.
A term in a type-1 query, together with its accompanying collection of attributes, is called an operand. Operands may be combined in a type-1 query, linked by boolean operators (And, Or, And-not, and Proximity).
Each attribute is a pair: an attribute type and a value of that type. An Attribute set defines a set of attribute types, and for each type defines the set of possible values.
An attribute set definition is assigned an object identifier, referred to as its attribute set identifier.
Example: The bib-1 attribute set defines a number of attribute types; one of which is Use. For bib-1 Use attributes, many attribute values are defined, one of which is personal name. Each type is assigned a numeric value, and each value is assigned a numeric value: type Use is assigned the value 1, and Use attribute Personal Name is assigned the value 1. Thus bib-1 Use attribute Personal Name is represented as the pair (1,1). This pair is further qualified by the bib-1 attribute set identifier (1.2.840.10003.3.1) to distinguish it from the pair (1,1) that may be defined by another attribute set.
Version 2 of Z39.50 has two serious limitations inhibiting the development of attribute architecture, both corrected in version 3:
This architecture strongly recommends that an attribute set definition that conforms to a particular class but defines attribute types that are not defined for that class should carefully define the interactions between the new attribute types and existing types defined for that class.
The architecture provides the attribute-set-class approach to allow flexibility and future expansion within the existing architecture. It is believed that attribute set Class 1 meets all known needs for an attribute class at this time. There may be other approaches developed which partition the set of attributes into fundamentally different types. This might result in the definition of a new attribute class inconsistent with Class 1. However, no need for such a separate class has been identified and it is not known whether additional classes will be necessary.
An Attribute set may define the set of values for a particular attribute type as follows:
The purpose of enumerating all of the possible attribute types within this "universal" attribute class is to provide a template for developers of attribute sets, and to set up a framework for interoperability among independently defined attribute sets which are intended to serve various communities. In particular, it should be possible for groups of content experts to develop new Access Point attributes, ASN.1 datatypes, comparison operators, and perhaps Format/Structure attributes which fit comfortably within this framework. Based on the template defined here, server developers may recognize attribute types omitted in a query operand, as well as illegal repetitions or combinations of attributes of given types that would indicate a malformed query operand.
When an attribute set is developed in accordance with Class 1, its definition should state that it is a Class 1 attribute set. In addition, the definition may describe the rules that apply (when it is used as the dominant set) for intermixing of attributes from different sets within an operand or query. Attributes from attribute sets conformant to Class 1 should not be mixed, within a query operand, together with attributes from historical attribute sets defined prior to the development of this attribute architecture (e.g. bib-1).
Any Class 1 attribute set follows the rules prescribed for Class 1 that apply to attribute types defined for that set. However, a Class 1 attribute set need not define nor populate every attribute type defined for Class 1. A Class 1 attribute set may define as few as one attribute type, or as many as all of the attribute types defined for Class 1.
Thus no specific attribute type is mandatory in the sense that it must be included in an attribute set definition. (This use of the term mandatory is different from the use of mandatory to mean that a particular attribute type may not be omitted in an operand, as used in the Occurrence column of the table in 3.3. For example, the Comparison attribute type is mandatory in an operand.)
However, a Class 1 attribute set must use the numeric values in the "Type Number" column in the table in section 3.3, to represent the types; if any of these types is omitted in the attribute set definition, the definition should skip the value for that type rather than renumber.
An attribute set might be developed for an application or profile and may refer to values of a particular attribute type that are defined by a different attribute set. If all of the values of that type that are required by the application are already defined by that other attribute set, then that attribute t ype need not be defined for the new set.
There may often be a close relationship between the development of a profile for a particular application, and the development of an attribute set definition to support the application. The profile might refer to several attribute sets in describing how to construct query operands (or entire queries). Thus the attribute set definition is not, itself, responsible for specifying all of the details of searching for the application when those details involve attributes from different attribute sets; however, the attribute set may offer as much commentary as it deems necessary and appropriate, for example, it may explain why a particular attribute has been omitted from its definition (for example, because another attribute set has defined it). It might explain how certain attributes that are defined in the set are to be combined with attributes from other sets.
The presence or absence of any attribute should not imply the presence of any other attribute, whether of the same or a different type. (For example, the presence of an Access Point attribute should not imply the presence of an otherwise omitted Format/Structure attribute, even if the relationship seems obvious.)
Even in cases where there is only one legal value of a Format/Structure attribute, and when the client might expect the server to deduce that value, it should be explicitly supplied. An exception is when the ASN.1 type completely and unambiguously determines the format, for example when the ASN.1 type is INTEGER or GeneralizedTime, or when the Z39.50 Date/Time definition is supplied (as EXTERNAL); in these cases the Format/Structure attribute may be omitted. ASN.1 type InternationalString does not unambiguously determine the format.
An attribute set developer should determine all of the Format/Structure attribute values necessary to fully specify the term formats relevant to the attribute set, and for each, either include it as a Format/Structure value in the attribute set definition, or ensure that it is defined in another attribute set (and provide appropriate reference within the attribute set definition).
While repeatability may be permissible for a given attribute type, as a general principle, an attribute type should not be repeated as a substitute for Boolean operations. To amplify this point, an attribute definition might prescribe how to interpret, for example, multiple Access Point attributes in a single operand. The definition might prescribe (as examples):
The definition may include a semantic indicator, allowing a client to select among several semantic alternatives. However, none of those alternatives should be to construct separate operands (linked by boolean 'and' or 'or') for each Access Point attribute -- the type-1 query supports boolean operations, so allowing another means of specifying boolean operations would add unnecessary complexity (in contrast to potential semantic interpretations of multiple Access Point attributes which cannot be otherwise represented via the type-1 query, as in the examples above).
The presence of an Access Point attribute is mandatory in an operand.
An attribute set definition that defines this type should include a discrete list of values.
An Access Point attribute may be indicated as not anchored (matching may occur beginning at any node within the element tree) by nesting it within an Access Point attribute of value 'wildpath' (for example as defined in the Utility attribute set). In the absence of a wildpath attribute, it is considered anchored (matching must occur from the root of the element tree).
Example of Anchored vs. Not Anchored:
Suppose a schema includes elements Description, Contact, and Availability, where Description is unstructured (has no sub-elements), Contact is structured into sub-elements Name, eMail, and Description, and Availability is structured into sub-elements, one of which is Contact, similarly structured (leaf elements shown in bold):When the single Access Point attribute Description is specified as anchored, then it is intended to match first-level element Description; if multiple Access Point attributes Contact and Description are specified as anchored, then it is intended to match Description within first-level element Contact. If Contact and Description are specified as not anchored, then it may match Description within first-level element Contact, or Description within Contact within Availability. If the single Access Point attribute Description is specified as not anchored, then it may match first-level element Description, Description within first-level element Contact, or Description within Contact within Availability.
- Description
- Contact
- Name
- Description
- Availability
- Contact
- Name
- Description
- .....
When the operand includes nested access points, each semantic qualifier value applies to the entire access point specification, that is, to the set of nested Access Point attributes.
The client may indicate that the server may ignore the Semantic Qualifier(s) by including a null Semantic Qualifier (see Utility attribute set) thus allowing the server to match the Access Point attribute value with null, in effect rendering it unqualified. The Semantic Qualifier attributes are in no sense combined among themselves. They are not presented as a list of increasingly precise qualifiers. An attribute set definition that defines this type should include a discrete list of values. This attribute is repeatable.
The Semantic Qualifier attribute is distinguished from the Functional Qualifier attribute, and the distinction is described below.
The Semantic Qualifier and Functional Qualifer types correspond to "type" and "role" respectively. The Semantic Qualifier describes the term itself, while the Functional Qualifier describes the relationship of the term to the object being searched. For example, consider a search on an author, where the author is a person and thus the term is a personal name. The Access Point attribute value would be 'name', the Semantic Qualifier value would be 'personal name' and the Functional Qualifier value would be 'author'.
When a qualifier (semantic or functional) is to be defined and it is unclear whether it should be a Semantic Qualifier or Functional Qualifier, it is recommended that it be defined as a Semantic Qualifier.
Server preprocessing instructions should be included in this type, for example, "do not treat any words in this term as a stopword" and "do not remove punctuation".
An attribute set definition that defines this type should include a discrete list of values. This attribute is repeatable.
The presence of a Comparison attribute is mandatory in an operand, as it is presumed that there is always a relationship between the term and the value of the access point to which the term is compared (otherwise there would be no basis for comparison) and that the client knows the relationship; therefore, based on the principle stated in 3.1.3 ("Omitted Attributes") the client should always supply the relationship.
The Comparison attribute is a generalization of the bib-1 Relation attribute, though named differently to avoid confusion. (The bib-1 Relation attribute is not mandatory in bib-1, as bib-1 has no such rules of occurrence, nevertheless, there is always a relationship, implied or explicit. One of the problems with bib-1, that Class 1 tries to correct, is the potential ambiguity when the relationship is not supplied.)
An attribute set definition that defines this type should include a discrete list of values. This attribute is non-repeatable.
Sample values might include:
Developers of specific Access Point attributes should consider defining (or utilizing existing) ASN.1 datatypes to support their applications -- for example, personal names, dates, geospatial information (points and polygons). There will of course be cases where the ASN.1 approach to datatyping will be too heavy-weight; in those cases the Format/Structure attribute type can be used in conjunction with ASN.1 type InternationalString to indicate that the content of a string represents data in a specific format. However, a character string term should not be used to represent an integer (e.g. to represent the integer 123, the term should assume ASN.1 type INTEGER, rather than the character string '123'.)
Personal names are an interesting boundary case where one might argue either for an ASN.1 based definition or a Format/Structure attribute indicating a normalized name according to some rules; the choice of the appropriate approach is best left to a bibliographic-attribute-definition working-group.
An attribute set definition that defines this type should include a discrete list of values. This attribute is non-repeatable.
Whenever the 'Character String' Format/Structure attribute is supplied:
An attribute set definition must use the numeric values in the "Type Number" column below to represent the types. If any of these types is omitted in an attribute set definition, the definition should skip the value for that type rather than renumber.
In the "value" column, 'list' means that when an attribute set defines that type, the attribute set definition should include a discrete list of values for the type.
In the Repeatable column, if the value is "yes" (meaning that the attribute type is repeatable) an attribute set definition may declare the type to be non-repeatable, but if the value is "no", an attribute set may not declare the type to be repeatable. (When an attribute set definition declares a type non-repeatable, this means that the attribute type may not repeat within any operand of a query, when the attribute set is specified as the dominant set for the query.)
In the Occurrence column, "mandatory" means that the attribute type must occur in an operand (it does not mean that a given attribute set must define that type). "Optional" means that in general the attribute type need not occur in every operand; however, a specific attribute set definition may declare that the attribute is mandatory (or mandatory in certain circumstances) in which case, the rules specified would be in effect when the attribute set is specified as the dominant set for a query. An attribute set definition may not declare an attribute type to be optional if it is listed as "mandatory" in this table.
The numerical order of attributes types (as listed in this table) differs from the order in which they are defined in section 3.2.1: attribute type Functional Qualifier is new in version 1.1 (it was not defined in version 1.0) and was added to the table at the end, to avoid re-numbering.
Attribute Type | Type Number | Value | Repeatable | Occurrence | Roughly-corresponding Bib-1 Type |
---|---|---|---|---|---|
Access Point | 1 | list | yes | mandatory | Use |
Semantic Qualifier | 2 | list | yes | optional | (new) |
Language | 3 | list | yes | optional | (new) |
Content Authority | 4 | list | yes | optional | (new) |
Expansion/Interpretation | 5 | list | yes | optional | Truncation and some of Relation |
Normalized Weight | 6 | numeric | no | optional | (new) |
Hit Count | 7 | numeric | no | optional | (new) |
Comparison | 8 | list | no | mandatory | most of Relation and part of Completeness |
Format/Structure | 9 | list | no | optional | Structure |
Occurrence | 10 | numeric | no | optional | (loosely) Completeness |
Indirection | 11 | list | no | optional | (new) |
Functional Qualifier | 12 | list | yes | optional | (new) |
The Utility set will define commonly used values for the Class 1 types. In addition it will include metatdata access points for records, as distinguished from metatdata access points for resources; the latter is the province of the Cross Domain set.
This distinction between record and resource is characterized by the example of a MARC record that describes a document. The MARC record is the "record" and the document is the "resource". This is not to imply that this architecture (or that Class 1) models record and resource as always distinct. When the record is the resource (e.g. in a document database) the metadata access points are the province of the Cross Domain set.
An example of an access point that characterizes the difference in purpose of the two sets is 'language': There will be a language access point in both sets. Utility set Access Point Language will refer to the language of the database record, while Cross Domain set Access Point Language will refer to the value of the language field. For example a MARC record, created in English, might describe a French book. The Utility Access Point attribute Language would refer to the language of the MARC record, while the Cross Domain Access Point attribute Language would refer to the language of the book (English and French, respectively). Another example: the "creator" of the MARC record is similarly distinguished from the "creator" (or author) of the book that the record describes.
The Cross-domain set is defined for use by cross-domain queries (where a single query is applied to multiple domains) and more generally, for attributes that apply to multiple domains, whether to be used in a cross-domain query or not. For example, Access point 'title' will be defined in the Cross Domain set and then need not be defined in any domain-specific set. Thus a bibliographic set need not define 'title'; rather, it could define semantic and/or functional qualifiers for 'title'; then, effective access point "bibliographic title" would be constructed using 'title' from the Cross Domain set and appropriate semantic and/or functional qualifiers from the bibliographic set.