NAME: Renaming of subfield 856$u to accommodate URNs
SOURCE: National Digital Library Program
SUMMARY: This paper proposes a variety of approaches when recording a URN in a MARC bibliographic record. For the current need to record a handle, field 856 subfield $u is used. In the future fields for other identifiers may be used (e.g. ISBN, ISSN, SICI), and at that time a content designator (perhaps subfield $u in the appropriate field) may be added to indicate that the identifier serves as a URN. This proposal suggests changing the name of 856$u to URI to accommodate both URLs and URNs and adding an indicator value blank (#) to show that that the access method is not provided for use with a URN.
KEYWORDS: Field 856 (Bibliographic/Holdings/Classification/Community Information); Electronic Location and Access; Subfield $u, in field 856 (Bibliographic/Holdings/Classification/ Community Information); URI; Uniform Resource Name
RELATED: DP96 (June 1996)
STATUS/COMMENTS:
5/1/97 - Forwarded to USMARC Advisory Group for discussion at the 1997 Annual MARBI meetings.
6/29/97 - Results of USMARC Advisory Group discussion - Approved as amended. A separate subfield should be used for the URN. LC should survey USMARC users to see if subfield $g has been used. If not, it can be reused for the URN; if so, another subfield will be chosen. Guidance should be given on how to code the indicator if there is both a URL and a URN in the field. It is important to note that this proposal approves a URN subfield at the 856 level, rather than the record level. As the URN standard moves forward, additional proposals will deal with the definition of a URN element when it relates to the entire resource described in the record.
8/21/97 - Result of final LC review - Agreed with the MARBI decision.
PROPOSAL NO. 97-9: Renaming of Field 856$u to accommodate URNs 1. BACKGROUND Discussion Paper No. 96 (Defining a Uniform Resource Name Field in the USMARC Bibliographic Format) explored issues concerning the definition of a data element in MARC for a Uniform Resource Name (URN). It considered the need to define a data element, either in a new field in the standard number block of fields (0XX) or as a subfield of field 856 (Electronic Location and Access). The Uniform Resource Name (URN) is the evolving standard being developed by the Internet Engineering Task Force (IETF) as one of the set of Uniform Resource Identification (URI) standards which deals with naming conventions. A URN is the name of a resource that identifies a unit of information independent of its location. URNs improve upon URLs because they are intended to provide a globally unique, location independent identifier that can be used for identification of the resource, and to thus facilitiate access to both metadata ("data about data") about it and to the resource itself. The URN refers to the intellectual entity, while the URL refers to a particular physical entity at a particular location. When Discussion Paper No. 96 was discussed in July 1996, Clifford Lynch, who is active in the IETF's URN group, reported on the progress being made on the issue. A new IETF working group had been recently formed that will attempt to use a few resolution schemes for experimenting with the URN framework. The goal is to attempt to disentangle operational issues from syntactical ones. Because of the multiplicity of naming assigners and complex relationships between URNs and a bibliographic record, he said that there may not be one clean answer concerning where to record a URN. MARBI was advised not to look to the IETF URN group to solve naming problems. In the future, identifiers already in place may serve as URN's, e.g. ISBNs, so an existing field could be used and a URN not necessarily carried explicitly. MARBI asked to be kept informed on URN developments to further consider where to put them in MARC in the future. 2. DISCUSSION Since Discussion Paper No. 96 was discussed, substantial progress has been made in the IETF on URN syntax and resolution. Although solutions have not been widely implemented, consensus has been reached on many issues. It is clear that any existing naming (identification) schemes will be accommodated in any URN solution. In support of this point, a paper was recently written by Cecilia Preston, Clifford Lynch and Ron Daniel on using existing bibliographic identifiers as Uniform Resource Names; it discusses fitting the ISBN, ISSN, and SICI into the URN framework and syntax. (URL: ftp://ds.internic.net/internet-drafts/draft- ietf-urn-biblio-00.txt) Another URN candidate is the handle identifier that the Corporation for National Research Initiatives (CNRI) is developing for the Library of Congress under contract. The handle server provides a lookup service for electronic resources that are part of the National Digital Library Program. Each item in a digital library is given a handle, which is a globally unique, persistent, independent identifier. The handle server will then supply data that an online catalog system or other access software needs to access the item. The handle server thus resolves the handle into a URL (or other form of physical locator) to retrieve the resource. The handle server system is designed to be a globally interconnected system. A catalog or web-browser that can connect to any single handle-server can request resolution of any handle, without having to know in advance which handle server holds the registration information. Recently ten institutions were awarded funds for digitization of historical material to be included in the American Memory Project through a grant from Ameritech. These digitized items will be distributed on the Internet in a manner that will augment the collections of the National Digital Library Program at the Library of Congress. Access to the digital collection is to be given through metadata in MARC, in non-MARC records or in finding aids. Some of the awardees were interested in using MARC records for metadata and wanted to explore the use of URNs such as handles as persistent identifiers linking the record to the item. To facilitate this effort and other projects demonstrating interoperability among catalogs and distributed digital repositories, the National Digital Library Program is considering how and where in the MARC record to record a handle. Two approaches might be used: 1) recording the "bare" handle (e.g. hdl:loc.pp.detroit/4a3271t) or 2) combining the handle with the Internet address of a proxy resolver into a "proxy URN" (e.g. http://hdl.handle.net/loc.pp.detroit/4a3271t). This proxy URN is a URL, which can be used by any system that can use a URL as a link but is no longer truly location-independent and persistent, because it relies on the continued existence of a particular proxy handle server. The proxy URN approach corresponds closely to OCLC's Persistent URL (PURL) system; the term "PURL-handle" has sometimes been used instead of "proxy URN." OCLC's PURL system creates logical addresses in the form of URLs which are translated through a PURL resolver into the URL of the current physical location. OCLC runs a PURL resolver through which others may create PURLs for their resources and offers the software free to any institution wishing to run its own resolver. This is a short-term partial solution to the problem of names for Internet resources that change location, a solution that can be supported now by all browsers and catalogs that recognize URLs but relies on the continued existence of the particular PURL resolver through which the PURL has been registered. The plan is for handle resolvers to be networked in a manner to make the particular resolver system approached irrelevant. In the long run, recording a handle in its bare form is more desirable than turning it into a URL, since the handle is truly location-independent and therefore more persistent. However, until all catalogs, web-browsers, and other access systems recognize URNs (and hence handles), there will be a transitional need to record identifiers in both forms to support broad-based experiments and allow the same MARC record to provide an effective linking mechanism in several systems, some of which may not be capable of recognizing handles. Eventually the need to record the URLs for items for which handles (or other forms of URN) exist will disappear, but the transitional phase may take many years. The Research Libraries Group is looking at similar naming issues in its "Studies in Scarlet" project. It expects to register PURL/handles for digitized items to function as URN's. RLG could also benefit from a resolution of the issue of where to put URN's in the MARC record. 3. FIELD 856 AND URNs The National Digital Library Program (NDLP) would like to use repeatable 856$u subfields to record URNS such as handles and the URLs associated with them (either a proxy URN or other URL). Because of the multiplicity of 856 fields that could exist in a record (e.g., different formats of same document, mirror sites, different subsets of same document, etc.), using repeated 856 fields would result in the need to link these repeated fields in the record with subfield $8 since the URNs and URLs need to be linked. This is not desirable both because of its complexity and because subfield $8 has not yet been implemented as a link in systems. Currently NDLP is experimenting with the multiple subfield $u's in field 856, some of which are handles. The field could include the following: - an http URL pointing to the item - a proxy URN (which is an http URL) attached to a handle; this approach treats the handle server as if it were a PURL resolver and can be used now to access the item by the handle - a bare handle, which can currently only be used directly by systems that recognize handles and can communicate with a handle-server to request resolution. Appropriate software exists to incorporate URNs into recent versions of the most popular Web browsers. The advantage to the last approach is all the advantages of URNs: persistence, global uniqueness, and location independent. Although it cannot currently be widely implemented yet, progress has been made at LC for its future use. In addition, it is desirable to record it for the future while creating the bibliographic record. Some of the other institutions that have been awarded grants for digitization through the Ameritech competition are interested in using URNs (either handles or others). In order to use subfield $u for URLs and URNs, the handle URN would need to include the initial letters: "urn:". This syntax has been accepted within the Internet Engineering Task Force's URN Working Group. Example: 245 00 $aGottscho Schleissner Collection (Library of Congress $h[graphic] 260 $cca. 1896-1970, bulk 1935-1955 300 $aca. 28,350 negatives :$bsafety film, some nitrate ;$c5 x 7 in. (13 x 18 cm.) or smaller. 300 $aca. 300 transparencies :$bfilm, color ; $c8 x 10 in. (21 x 26 cm.) or smaller 300 $a11 albums of photographic or photomechanical prints :$bsilver gelatin, some cyanotype, some color ;$c16.5 x 14 in. (42 x 35 cm.) or smaller 300 $aca. 275 photographic prints :$bb&w, silver gelatin ;$c17 x 14 in. (43 x 36 cm. or smaller. 856 40 $uhttp://lcweb2.loc.gov/ammem/gschtml/gotthome.htm l $uhttp://hdl.handle.net/loc.test/gotthome $uurn:hdl:loc.test/gotthome [Record is for original with 856 added for access to the digitized items. First $u is a URL; second $u is a proxy URN; third $u is a bare handle. Three $u subfields have been shown here to cover the options described above. In practice there would seldom be a reason to use all three except for experimental projects. Two $u subfields would be more common.] If a bare handle were used in field 856 subfield $u, the following changes would need to be made to the 856 field: 1) change the name of $u to "Uniform Resource Identifier (URI)"; this is the umbrella term used for the various UR standards 2) add an indicator value to the first indicator (Access method) to show that a URN is in the field. Although the definition of an new indicator value for "URN" could be considered, a more generic value might be added that shows that an Internet access protocol is not retrieving the item, but the name must be resolved into a location. It may be preferable to limit additions to the indicator values, since the values are essentially redundant information that is also contained in the URN or URL itself. However, there are a few cases where the information in the indicator is not explicit in the field, such as resources accessible by email (e.g., listservs). If repeatable $u's are used as in the record above, guidelines could state that if there is at least one http URL, indicator value 4 (HTTP) is used. Alternatively, a new subfield could be defined. However there are only two subfields available: $e or $y. Since a URN is intended to be used for locating the item and systems are programmed to use $u as a hot link, it may be more appropriate to also use it for URNs. Using a standard number field in the 0XX block is not a workable solution for all NDLP records. There is no assurance that there will be one URN per bibliographic record. For instance, handles are being assigned for different portions of the items, such as a separate one for a picture from a digitized book. In addition LC expects to create handles for a finding aid which would be referenced in the record along with a handle for the object itself. In the future different naming authorities may assign URNs or handles according to different guidelines. Thus, there may not be one place in the record to contain a URN, but various depending upon at what level the bibliographic description is given and how the electronic resources are being referenced. As other URN schemes are implemented, other proposals will be brought forward as needed to accommodate them. For instance, it is likely that the existing 020 field (ISBN) and 024 (Other Standard Number) could be used for URNs based on existing standard numbers such as the ISBN and SICI. A possible means to specify URNS based on standard numbers is to add a $u subfield to any of these fields. Specific encoding of the data will need to be worked out (i.e., whether to encode the complete URN as it is formulated, even though some of the information would be redundant with what is contained in $a, or only to indicate in some way that the number can also be used as a URN). 4. PROPOSED CHANGES The following is presented for consideration: * In the USMARC Bibliographic/Holdings/Classification/ Community Information Formats, change the name of subfield $u (Uniform Resource Locator) in Field 856 to Uniform Resource Identifier. * In the USMARC Bibliographic/Holdings/Classification/ Community Information Formats, define the following value in Field 856, First indicator: # No information provided