NAME: Defining a Uniform Resource Name Field in the USMARC Bibliographic Format
SOURCE: Library of Congress
SUMMARY: This paper discusses issues concerning the definition of a field or subfield for a Uniform Resource Name, which is a location-independent, persistent identifier for an Internet resource. It considers whether the URN should be defined in the standard number block of fields (0XX) or as a subfield of field 856 (Electronic Location and Access).
KEYWORDS: Uniform Resource Name; Internet resources
STATUS/COMMENTS:
5/6/96 - Forwarded to USMARC Advisory Group for discussion at the July 1996 MARBI meetings.
7/6/96 - Results of USMARC Advisory Group discussion - The consensus of the group was not to define an element in MARC for URNs now, but that it was important for the MARC community to follow developments. Clifford Lynch reported about the recent progress within the Internet Engineering Task Force (IETF) to further define the URN and URN resolution systems. A new working group was established that has developed documents and attempts to use one or two resolution schemes (especially one based on the Domain Name System) for proof of concept, hoping to disentangle operational issues from syntactical ones. Although some participants felt that a URN belongs in an 0XX field that is repeatable, Lynch stated that there may not be a clear answer because of the multiplicity of naming assigners and the complexity of relationships between URNs and bibliographic records. It is possible that identifiers currently used in the bibliographic community (e.g. ISBNs) may be embedded in a URN scheme and used in the future to serve as URNs.
DISCUSSION PAPER NO. 96: Defining a Uniform Resource Name Field 1. BACKGROUND The Internet Engineering Task Force (IETF) is the protocol engineering and development arm of the Internet. It includes network designers, operators, vendors, and researchers concerned with the development and smooth functioning of the Internet. Much of its work is done in working groups. Until summer 1995 the URI (Uniform Resource Identification) Working Group was chartered with developing a set of standards for the encoding of system- independent resource location and identification information for the use of Internet information resources. In mid-1995 the URI Working Group was dissolved, since it had reached its original goals and its current work was too broad to gain consensus. It was divided into separate working groups based on individual standards. The Uniform Resource Name (URN) is the URI standard which deals with naming conventions. A URN is the name of a resource that identifies a unit of information independent of its location. Other elements in the URI architecture include location (URL) and description/metadata (URC, or Uniform Resource Characteristic). A URN resolution service would be used to retrieve information about the named resource. Within this architecture, URN's are used for identification, URC's for meta-information (in the library world, roughly equivalent to a bibliographic record), and URLs for locating or finding resources (now defined in MARC as field 856). Everyone agrees that the concept of the URN, a persistent, unique name that can be used to provide a location for a resource, is what is needed for the future viability of electronic information retrieval. 2. DISCUSSION URN requirements. URNs improve upon URLs because they are intended to provide a globally unique, location independent identifier that can be used for identification of the resource, and to thus facilitiate access to both metadata ("data about data") about it and to the resource itself. The URN refers to the intellectual entity, while the URL refers to a particular physical entity. Thus, a URN can refer to multiple copies of an object, or instantiations with only minor format or encoding variations, that would have to have separate URLs as separate physical objects. Persistence is desirable, and must be provided by naming authorities. In the document RFC 1737: Functional Requirements for Uniform Resource Names by K. Sollins and L. Masinter <http://ds.internic.net/rfc/rfc1737.txt> requirements are: global scope; global uniqueness; persistence; scalability; legacy support; extensibility; independence; resolution. URN requirements include requirements on their functional capabilities and requirements on the way they are encoded. To use a URN, there must be a resolution service that can map the name to the corresponding resource and return one or many locations. URN schemes. The difficulty in bringing the URN work to an agreed- upon standard has revolved around the resolution services that will deploy them. Currently there several URN schemes proposed which have some aspects in common. The major differences seem to lie in the resolution mechanisms used. These schemes are: 1) Resource Cataloging and Distribution Service from the University of Tennessee; 2) Handle scheme from the Corporation of National Research Initiatives; 3) X-DNS-2 URN scheme, by Paul Hoffman and Ron Daniel, based on the Internet Domain Name System; 4) URN services, developed at OCLC and focusing on the syntax and functions of URNs; 5) Path-URN scheme from the National Center for Supercomputing Applications, which also makes use of the Internet Domain Name System; 6) Whois++, which uses the existing Whois++ system as an Internet Directory Service. URN implementors have agreed that there will be multiple URN schemes, not a single canonical one. The scheme used is encoded as part of the URN. URN groups reached outline agreement on most of the major issues at a meeting of URN groups in October 1995 at the University of Tennessee. The consensus that was reached on URNs results in the ability of users to be able to incorporate URNs from existing naming schemes in documents and on-line systems without having to be concerned that they will later have to reformat or modify existing URNs. The framework that was agreed upon will continue to support existing URNs through resolution systems. That framework will evolve further and it allows for different naming approaches. URN framework. In recent agreements, some general principles have been established concerning URNs. 1) Both a naming scheme (a procedure for creating and assigning unique URNs conforming to a specified syntax) and a resolution system (a network-accessible service that stores URNs and resolves them) are necessary for URNs to work. 2) Naming schemes are not tied to resolution systems. Any resolution system should be able to resolve a URN from any of the naming schemes. 3) Mechanisms need to be created so that the user of a URN can discover what resolution systems can resolve the URN. This URN registry is necessary because of the independence of naming schemes and resolution systems. 4) The naming authority determines the unique name. It has authority over naming conventions and assignment. Syntax. Much agreement has been reached on the syntax of URNs. There are several fields that make up a URN: 1) The text string "URN:" opinions differ as to whether this should be included as part of the name. 2) SchemeID: type of naming scheme used (e.g. hdl for Handle; path for Path URN scheme) 3) AuthorityID: name of individual, group or system within the SchemeID that is allowed to create ElementIDs. (This may be a domain name.) 4) ElementID: the element that will be resolved. It might be considered the name of the object, although it becomes a name only in conjunction with the other elements. There could be many objects with the same ElementID, so the SchemeID and AuthorityID are also necessary to make the URN unique. Different naming schemes can use different formats. The fields are separated by colons. Examples: urn:hdl:cnri.dlib/august95 (SchemeID=handle scheme; the domain name cnri.dlib=AuthorityID; august95=ElementID) urn:path:/A/B/C/doc.html (SchemeID=pathURN; AuthorityID is a path (/A/B/C/); ElementID=doc.html) As the URN standard has been discussed but not completed, institutions have had to try to find solutions for the URL changability problem. With the lack of a general resolution mechanism widely agreed upon, institutions have developed naming conventions and resolution techniques that make sense locally. Usually a unique name serves as an identifier to locate the item using locally developed software (i.e. the resolution system). Implementation of URNs. There are several projects that are using URNs and attempting to resolve them. One is the handle server at CNRI, using the handle URN. Another is the OIL (Open Information Locator) project at the Cooperative Research Center's Research Data Network in Australia. The OIL project focuses on access to large scale collections of resources and uses standards-based protocols. In a Web browser it allows the user to click on a URN which connects to a resolution system and returns a resource description (or Uniform Resource Citation (URC)). URLs and PURLs. Uniform Resource Locators (URL) have been widely used and accepted as a method for locating resources on the Internet. During the period of their development, Proposal No. 93-4 (Changes to the USMARC Bibliographic Format (Computer Files) to Accommodate Online Information Resources) defined a new MARC field 856 to accommodate this information. Since its publication in the USMARC Bibliographic Format, the field has been widely used in bibliographic records to provide links to the electronic resource. One project that has experimented with the field is OCLC's INTERCAT project. Using URLs in bibliographic records has pointed out the problem of resources moving from one location to another, with locations themselves changing names or becoming obsolete. As a response to the problem, OCLC developed the Persistent Uniform Resource Locator (PURL) and established a PURL server. Instead of pointing directly to the location of a resource through a URL, the pointer is to an intermediate resolution service which associates the PURL with a specific URL. PURLs are not URNs, but they satisfy many of the requirements using technology that can be deployed now. The INTERCAT catalog is using PURLs in bibliographic records in field 856 to diminish the maintenance of URL information in records. 3. URNs and MARC. Although the URN has not yet become a definite standard and has not become routinely deployed for all Internet resources, enough consensus has emerged that the MARC community might consider how to add the data element to the format. The process of standards approval is much different in the Internet Engineering Task Force than in other standards communities that librarians may be familiar with. In the IETF world, proposed standards are broadly used before actually being approved to ensure their viability. The USMARC Advisory Group approved field 856 before the URL was widely accepted, but it was well under development. There are two places in the MARC bibliographic format to consider for a URN: 1) In the 0XX block of fields for standard numbers and codes 2) As a subfield of field 856. It seems most appropriate to include a URN at the record level, that is in a 0XX field. The URN has often been compared to an ISBN or ISSN in that it is a persistent unique identifier. Field 026 could be defined for a Uniform Resource Name. Alternatively, a new subfield could be defined in field 856. The disadvantage to this approach is that it puts the URN at the copy level (field 856 is in the holdings block of fields, similar to field 852 for the library location). Field 856 may be repeated for each instance of the resource (e.g. if the resource were available from different sites or available using different access methods). If multiple 856 fields were in the record, the URN would then have to be repeated in each. The only subfields in field 856 that are available are subfields $e and $y. Recently it has been suggested that PURLs would be used to resolve handles (one of the URN schemes). If this were the case, PURLs have been used in field 856. In the INTERCAT database, the URL in subfield $u of the contributed record is being shifted to subfield $z and a PURL assigned and placed in 856$u. This situation needs to be considered in determining whether to use field 856 for the URN if the PURL needs to be associated with the URN. However, if an 0XX field were used for URN, subfield $8 (Link and sequence number) could be used to link to the 856 data. Another issue to consider is how URNs will be assigned. According to previous discussions within the URI working group concerning the assignment of URNs, name assignment is delegated to naming authorities that can determine naming conventions as well as when to assign a new name. Whether a new URN is assigned to an electronic resource that is another instance of an already existing object (e.g. another format, another access or encoding method, etc.) is the decision of the naming authority. This distributed model is similar to the assignment of ISBNs, which is the responsibility of the publisher to determine. If field 026 were used and if more than one URN were assigned to an object that was considered one bibliographic entity for cataloging purposes, it is possible that the URNs would have to be linked to the appropriate electronic locations in field 856. If so, linking subfield $8 could also be used for this purpose and a code for electronic link would need to be defined. On the other hand, the library community may wish to establish rules for when a new URN would be assigned, and perhaps only create a new bibliographic record for each object represented by a URN. However, since the naming authority determines URN assignment, bibliographic records could be created for entities that have URNs assigned by agencies outside the library community. 4. QUESTIONS The following questions need to be considered. 1. Is it appropriate to define a field to accommodate the Uniform Resource Name now? Or should the USMARC Advisory Group wait until institutions want to begin inputting them into their records? 2. How might libraries that become naming authorities want to assign new URNs? What sort of guidelines would be needed, and how might they be developed? What types of institutions might become naming authorities? 3. If defined, should the URN be placed in field 026 (or other 0XX) or in a subfield of field 856? What are the implications of that decision? Some of the material in this paper was taken from: "Naming Conventions for Digital Resources", by Rebecca Guenther. <http://www.loc.gov/marc/naming.html> "Uniform Resource Names: a progress report", by URN implementors. D-Lib Magazine, Feb. 1996. <http://www.dlib.org/dlib/february96/02arms.html>