DISCUSSION PAPER NO. 96

DATE: May 6, 1996
REVISED:

NAME: Defining a Uniform Resource Name Field in the USMARC Bibliographic Format

SOURCE: Library of Congress

SUMMARY: This paper discusses issues concerning the definition of a field or subfield for a Uniform Resource Name, which is a location-independent, persistent identifier for an Internet resource. It considers whether the URN should be defined in the standard number block of fields (0XX) or as a subfield of field 856 (Electronic Location and Access).

KEYWORDS: Uniform Resource Name; Internet resources

STATUS/COMMENTS:

5/6/96 - Forwarded to USMARC Advisory Group for discussion at the July 1996 MARBI meetings.

7/6/96 - Results of USMARC Advisory Group discussion - The consensus of the group was not to define an element in MARC for URNs now, but that it was important for the MARC community to follow developments. Clifford Lynch reported about the recent progress within the Internet Engineering Task Force (IETF) to further define the URN and URN resolution systems. A new working group was established that has developed documents and attempts to use one or two resolution schemes (especially one based on the Domain Name System) for proof of concept, hoping to disentangle operational issues from syntactical ones. Although some participants felt that a URN belongs in an 0XX field that is repeatable, Lynch stated that there may not be a clear answer because of the multiplicity of naming assigners and the complexity of relationships between URNs and bibliographic records. It is possible that identifiers currently used in the bibliographic community (e.g. ISBNs) may be embedded in a URN scheme and used in the future to serve as URNs.

DISCUSSION PAPER NO. 96: Defining a Uniform Resource Name Field

1.     BACKGROUND

The Internet Engineering Task Force (IETF) is the protocol
engineering and development arm of the Internet.  It includes
network designers, operators, vendors, and researchers concerned
with the development and smooth functioning of the Internet.  Much
of its work is done in working groups.  Until summer 1995 the URI
(Uniform Resource Identification) Working Group was chartered with
developing a set of standards for the encoding of system-
independent resource location and identification information for
the use of Internet information resources.  In mid-1995 the URI
Working Group was dissolved, since it had reached its original
goals and its current work was too broad to gain consensus.  It was
divided into separate working groups based on individual standards.

The Uniform Resource Name (URN) is the URI standard which deals
with naming conventions.  A URN is the name of a resource that
identifies a unit of information independent of its location. 
Other elements in the URI architecture include location (URL) and
description/metadata (URC, or Uniform Resource Characteristic).   
A URN resolution service would be used to retrieve information
about the named resource.  Within this architecture, URN's are used
for identification, URC's for meta-information (in the library
world, roughly equivalent to a bibliographic record), and URLs for
locating or finding resources (now defined in MARC as field 856). 
Everyone agrees that the concept of the URN, a persistent, unique
name that can be used to provide a location for a resource, is what
is needed for the future viability of electronic information
retrieval.   


2.     DISCUSSION

URN requirements.  URNs improve upon URLs because they are intended
to provide a globally unique, location independent identifier that
can be used for identification of the resource, and to thus
facilitiate access to both metadata ("data about data") about it
and to the resource itself.  The URN refers to the intellectual
entity, while the URL refers to a particular physical entity. 
Thus, a URN can refer to multiple copies of an object, or
instantiations with only minor format or encoding variations, that
would have to have separate URLs as separate physical objects. 
Persistence is desirable, and must be provided by naming
authorities.  In the document RFC 1737: Functional Requirements for
Uniform Resource Names  by K. Sollins and L. Masinter
<http://ds.internic.net/rfc/rfc1737.txt> requirements are: global
scope; global uniqueness; persistence; scalability; legacy support;
extensibility; independence; resolution.  URN requirements include
requirements on their functional capabilities and requirements on
the way they are encoded.

To use a URN, there must be a resolution service that can map the
name to the corresponding resource and return one or many
locations.  

URN schemes.  The difficulty in bringing the URN work to an agreed-
upon standard has revolved around the resolution services that will
deploy them.  Currently there several URN schemes proposed which
have some aspects in common.  The major differences seem to lie in
the resolution mechanisms used.  These schemes are: 1) Resource
Cataloging and Distribution Service from the University of
Tennessee;  2) Handle scheme from the Corporation of National
Research Initiatives; 3) X-DNS-2 URN scheme, by Paul Hoffman and
Ron Daniel, based on the Internet Domain Name System; 4) URN
services, developed at OCLC and focusing on the syntax and
functions of URNs; 5) Path-URN scheme from the National Center for
Supercomputing Applications, which also makes use of the Internet
Domain Name System; 6) Whois++, which uses the existing Whois++
system as an Internet Directory Service.

       URN implementors have agreed that there will be multiple URN
schemes, not a single canonical one.  The scheme used is encoded as
part of the URN.  URN groups reached outline agreement on most of
the major issues at a meeting of URN groups in October 1995 at the
University of Tennessee.  The consensus that was reached on URNs
results in the ability of users to be able to incorporate URNs from
existing naming schemes in documents and on-line systems without
having to be concerned that they will later have to reformat or
modify existing URNs.  The framework that was agreed upon will
continue to support existing URNs through resolution systems.  That
framework will evolve further and it allows for different naming
approaches.

URN framework.  In recent agreements, some general principles have
been established concerning URNs.  
       1)  Both a naming scheme (a procedure for creating and
       assigning unique URNs conforming to a specified syntax) and a
       resolution system (a network-accessible service that stores
       URNs and resolves them) are necessary for URNs to work.  
       2)  Naming schemes are not tied to resolution systems.  Any
       resolution system should be able to resolve a URN from any of
       the naming schemes.  
       3)  Mechanisms need to be created so that the user of a URN
       can discover what resolution systems can resolve the URN. This
       URN registry is necessary because of the independence of
       naming schemes and resolution systems.
       4)  The naming authority determines the unique name.  It has
       authority over naming conventions and assignment.

Syntax.  Much agreement has been reached on the syntax of URNs. 
There are several fields that make up a URN:
       1)  The text string "URN:" opinions differ as to whether this
       should be included as part of the name.
       2) SchemeID: type of naming scheme used (e.g. hdl for Handle;
       path for Path URN scheme)
       3) AuthorityID: name of individual, group or system within the
       SchemeID that is allowed to create ElementIDs. (This may be a
       domain name.)
       4) ElementID:  the element that will be resolved.  It might be
       considered the name of the object, although it becomes a name
       only in conjunction with the other elements.  There could be
       many objects with the same ElementID, so the SchemeID and
       AuthorityID are also necessary to make the URN unique.
Different naming schemes can use different formats.  The fields are
separated by colons.

Examples:
       urn:hdl:cnri.dlib/august95
       (SchemeID=handle scheme; the domain name
cnri.dlib=AuthorityID; august95=ElementID)

       urn:path:/A/B/C/doc.html
       (SchemeID=pathURN; AuthorityID is a path (/A/B/C/);
ElementID=doc.html)

As the URN standard has been discussed but not completed,
institutions have had to try to find solutions for the URL
changability problem.  With the lack of a general resolution
mechanism widely agreed upon, institutions have developed naming
conventions and resolution techniques that make sense locally. 
Usually a unique name serves as an identifier to locate the item
using locally developed software (i.e. the resolution system). 

Implementation of URNs.  There are several projects that are using
URNs and attempting to resolve them.  One is the handle server at
CNRI, using the handle URN.  Another is the OIL (Open Information
Locator) project at the Cooperative Research Center's Research Data
Network in Australia.  The OIL project focuses on access to large
scale collections of resources and uses standards-based protocols. 
In a Web browser it allows the user to click on a URN which
connects to a resolution system and returns a resource description
(or Uniform Resource Citation (URC)).   

URLs and PURLs.  Uniform Resource Locators (URL) have been widely
used and accepted as a method for locating resources on the
Internet. During the period of their development, Proposal No. 93-4
(Changes to the USMARC Bibliographic Format (Computer Files) to
Accommodate Online Information Resources) defined a new MARC field
856 to accommodate this information.  Since its publication in the
USMARC Bibliographic Format, the field has been widely used in
bibliographic records to provide links to the electronic resource. 
One project that has experimented with the field is OCLC's INTERCAT
project.  Using URLs in bibliographic records has pointed out the
problem of resources moving from one location to another, with
locations themselves changing names or becoming obsolete.  

       As a response to the problem, OCLC developed the Persistent
Uniform Resource Locator (PURL) and established a PURL server. 
Instead of pointing directly to the location of a resource through
a URL, the pointer is to an intermediate resolution service which
associates the PURL with a specific URL. PURLs are not URNs, but
they satisfy many of the requirements using technology that can be
deployed now.  The INTERCAT catalog is using PURLs in bibliographic
records in field 856 to diminish the maintenance of URL information
in records.


3.     URNs and MARC.
Although the URN has not yet become a definite standard and has not
become routinely deployed for all Internet resources, enough
consensus has emerged that the MARC community might consider how to
add the data element to the format.  The process of standards
approval is much different in the Internet Engineering Task Force
than in other standards communities that librarians may be familiar
with.  In the IETF world, proposed standards are broadly used
before actually being approved to ensure their viability.  The
USMARC Advisory Group approved field 856 before the URL was widely
accepted, but it was well under development.  

There are two places in the MARC bibliographic format to consider
for a URN:
       1) In the 0XX block of fields for standard numbers and codes
       2) As a subfield of field 856.

It seems most appropriate to include a URN at the record level,
that is in a 0XX field.  The URN has often been compared to an ISBN
or ISSN in that it is a persistent unique identifier.  Field 026
could be defined for a Uniform Resource Name.  

Alternatively, a new subfield could be defined in field 856.  The
disadvantage to this approach is that it puts the URN at the copy
level (field 856 is in the holdings block of fields, similar to
field 852 for the library location).  Field 856 may be repeated for
each instance of the resource (e.g. if the resource were available
from different sites or available using different access methods). 
If multiple 856 fields were in the record, the URN would then have
to be repeated in each.  The only subfields in field 856 that are
available are subfields $e and $y.  Recently it has been suggested
that PURLs would be used to resolve handles (one of the URN
schemes).  If this were the case, PURLs have been used in field
856.  In the INTERCAT database, the URL in subfield $u of the
contributed record is being shifted to subfield $z and a PURL
assigned and placed in 856$u.  This situation needs to be
considered in determining whether to use field 856 for the URN if
the PURL needs to be associated with the URN.  However, if an 0XX
field were used for URN, subfield $8 (Link and sequence number)
could be used to link to the 856 data.

Another issue to consider is how URNs will be assigned.  According
to previous discussions within the URI working group concerning the
assignment of URNs, name assignment is delegated to naming
authorities that can determine naming conventions as well as when
to assign a new name.  Whether a new URN is assigned to an
electronic resource that is another instance of an already existing
object (e.g. another format, another access or encoding method,
etc.) is the decision of the naming authority.  This distributed
model is similar to the assignment of ISBNs, which is the
responsibility of the publisher to determine.  If field 026 were
used and if more than one URN were assigned to an object that was
considered one bibliographic entity for cataloging purposes, it is
possible that the URNs would have to be linked to the appropriate
electronic locations in field 856.  If so, linking subfield $8
could also be used for this purpose and a code for electronic link
would need to be defined.  On the other hand, the library community
may wish to establish rules for when a new URN would be assigned,
and perhaps only create a new bibliographic record for each object
represented by a URN.  However, since the naming authority
determines URN assignment, bibliographic records could be created
for entities that have URNs assigned by agencies outside the
library community. 



4.     QUESTIONS
The following questions need to be considered.

1.  Is it appropriate to define a field to accommodate the Uniform
Resource Name now?  Or should the USMARC Advisory Group wait until
institutions want to begin inputting them into their records?

2.  How might libraries that become naming authorities want to
assign new URNs?  What sort of guidelines would be needed, and how
might they be developed? What types of institutions might become
naming authorities?  
 
3.  If defined, should the URN be placed in field 026 (or other
0XX) or in a subfield of field 856? What are the implications of
that decision?

Some of the material in this paper was taken from:
"Naming Conventions for Digital Resources", by Rebecca Guenther.  
<http://www.loc.gov/marc/naming.html>
"Uniform Resource Names: a progress report", by URN implementors. 
D-Lib Magazine, Feb. 1996.
<http://www.dlib.org/dlib/february96/02arms.html>

Go to:

Library of Congress

Library of Congress Help Desk (09/03/98)