Skip Navigation Links  The Library of Congress >> Standards
IFLA/CDNL Alliance for Bibliographic Standards - Library of Congress
  URI Resource Pages Home >> About URIs

URI Resource Pages

About URIs


Contents


Definition of URI

URI stands for Uniform Resource Identifier. There is no widely accepted definition; the most general is:

A URI identifies a resource

Implicit in the term "Uniform Resource Identifier" are three concepts:

So what is a Resource Anyway?

There has never been an accepted definition of "resource" (in the URI context); it can be almost anything: a web page, document, database, image, service, (recursively) a collection of resources, a physical object (person, book, etc.), even a concept. One proposed definition is "anything that has identity". (Not very useful if we define URI as we have above: "a URI identifies a resource". That essentially reduces the "resource" defintion to "anything a URI identifies" and we have rather circular definitions. In fact probably not useful in anycase, as it's difficult to say what does and does not have identity.) Another proposed definition is "anything that can be named" (suffers because of "name vs. location" controversy; more below). Other proposed definitions are not any more useful than these. So, "resource" is best left undefined.


URI Schemes

URIs are distinguished by their "scheme". The following character strings are URIs:

  1. http://www.loc.gov
  2. telnet://rs8.loc.gov
  3. mailto:someone@loc.gov
  4. ftp://ftp.loc.gov/pub/z3950/articles/kbr.ps
  5. z39.50s://melvyl.ucop.edu/cat
  6. info:srw/schema/1/mods-v3.0
  7. tag:hawke.org,2001-06-05:Taiko
  8. urn:nbn:de:gbv:089-3321752945

Their schemes, respectively, are 'http', 'telnet', 'mailto', 'ftp', 'z39.50s', 'info', 'tag' and 'urn'.

So the pattern is: a uri is a character string beginning with a scheme name followed by a colon (':') and the remainder of the uri is "scheme specific"; its interpretation depends on the scheme.

Some uri schemes correspond to a specific protocol. "Protocol" in this context means some rigorously defined procedure that describes what is supposed to happen when you activate (click on) a uri of that scheme. In that sense, in the above examples schemes 1-5 are protocols, 'info'  and 'tag' (6, 7) are not, and for 'urn' (8) it depends on the URN namespace (see below).

Thus:

  • For some, the scheme name is the same as the protocol name, e.g. 'http', 'telnet', 'ftp';
  • for others, the scheme name is not a protocol name but still corresponds to a protocol, e.g.'z39.50s' and 'mailto';
  • some schemes do not correspond to a protocol at all, for example, 'info' and 'tag'; and
  • in some cases, the scheme may correspond to multiple protocols, for example 'urn'.

There is a list of URI schemes at http://www.iana.org/assignments/uri-schemes, the Official IANA Registry of URI Schemes.

IANA is the "Internet Assigned-numbers Authority".  Links to IANA registries


What Does a URI Do?

Does a URI identify, locate, retrieve, dereference, name, resolve... or what?
  • Identify or Locate? Some say a URI identifies a resource. Some say it locates a resource.
  • ... or Name? There is a distinction among these three -- locate, name, identify -- some say it is too subtle to formalize, others disagree.
  • Locate or Retrieve? This has more to do with URLs in particular than URIs in general. Some say URLs locate a resource, that is, they identify its location. Some say they retrieve the resource.
  • Retrieve or "dereference"? See below.
  • Resolve? see below.

It may simplify to consider that a URI does one or the other (or both) of two things: identify and/or dereference.

Identify

An important class of URIs simply identify a resource, and are not intended to retrieve (/dereference) or locate it. Some of these are simply pure identifiers, serving the same purpose as ISO OID did before URIs came along. 'info' URIs are in this class.

Before 'info' came along people tended to use 'http' when a pure identifier was needed (for example, for rdf). And in fact there are many of these legacy 'http' identifiers in use today, and even more being assigned. Some people think this is legitimate; others (particularly, many people in the library and publishing community) feel that 'http' is not a good scheme to use for a pure identifier, because an 'http' URI is a URL, and as such must be actionable.

Look at the list of SRW Schema identifiers and note that some of these are 'http' URIs and others, 'info'. The owner of the schema gets to decide which URI scheme to use.

Dereference

When you "click" on a URL (see below), something is supposed to happen; typically a web page appears. One might say that a retrieval has occurred - your web client has retrieved the resource (the web page) from the web server. The information retrieval community likes to think of this as retrieval but that term has different connotations in other communities. Some argue that if you retreive a resource it no longer resides at the location from which it was retrieved - it cannot be in two places at once (the "retrieve a book" metaphor). Sometimes the awkward phrase "retrieve a representation" is used, but more popular is the term dereference which means roughly the same thing.

Google has some interesting definitions of "dereference":

  • Access the value pointed to by a pointer.
  • Use a reference to access a data value.
  • Retrieve the value stored at the referenced address. (So here, retrieve and dereference really are the same!)
  • (and, interestingly) Resolve a reference. (So here, dereferencing and resolution are the same!) See resolution.

URIs, URLs, and URNs

As we said above, and almost everyone agrees, when you click on a URL something is supposed to happen. However, not all URIs are URLs -- not every URI is actionable in this sense (that when you click on it something happens) -- in particular and as we noted above, an 'info' URI is (in general) not actionable.

Originally, the URL, "Uniform Resource Locator", was conceived; the URI was a generalization of the URL concept, along with the URN, "Uniform Resource Name", which was an attempt to define (more or less) "persistent" identifiers. So for a time it was believed that URIs were partitioned into two classes, URLs and URNs. (And for a short time, three, with the addition of the URC, "Universal Resource Citation", but that never went anywhere.) However, a different view held that the important distinction was between URI and URL (URIs identify and URLs locate) and URNs did not even fit into this model.

Much of this was sorted out (see RFC 3305 also published as W3C Note 21 September 2001) when it was agreed:

  • 'urn' is simply a URI scheme, like 'http' and 'info';
  • 'url' is not; but
  • URL is a useful but informal concept, refering to "a type of URI that identifies a resource via a representation of its primary access mechanism (e.g., its network 'location'), rather than by some other attributes it may have. Thus ..., 'http:' is a URI scheme. An http URI is a URL. The phrase 'URL scheme' is now used infrequently, usually to refer to some subclass of URI schemes which exclude URNs." (Quote from WSC Note.)

Resolution

A URN is (theoretically) a persistent identifier for a resource, independent of location or access method.

URN Conceptual Model

Conceptually, a URN maps to one or more URLs for the resource. When a user activates (clicks on) a URN the browser finds the the set of associated URLs, selects one (perhaps based on location, or perhaps, access method), and then attempts to retrieve the resource. If the attempt fails the browser might try another URL in the list. All of this is transparent to the user.

If the resource is replicated on an additional server, a URL is added to the list. If the resource is removed from a server, a URL is deleted. If there is a single copy of the resource, and it is moved, the URL is updated. In any case the URN never changes.

The process of finding the list of URLs corresponding to a URN, and selecting one, as described in the model above, is called resolution.

URN Namespaces and Syntax

The universe of URNs is partitioned into namespaces. Each is assigned a namespace identifier (NID). See the IANA Registry of URN  Namespaces.  So a URN consists of:

  • the scheme - 'urn'
  • colon separator -  ':'
  • the NID, e.g. 'nbn'
  • another colon separator - ':'
  • a namespace specific string (NSS) e.g. 'de:gbv:089-3321752945'

So the URN for the NBN (National Bibliographic Number)  'de:gbv:089-3321752945' is:

urn:nbn:de:gbv:089-3321752945

Note that the namespace specific string in this examples includes additional colon (':') separators, as prescribed by the definition for the specific URN namespace, in this case the National Bibliographic Number, described in RFC 3188.   For each URN NID there is an NSS definition.  In the NBN case, the definition prescribes additional structuring (including a country and subauthority), however from the point-of-view of the URN syntax, the NSS is simply an opaque, flat string.

So what happened to URNs?

URNs never caught on because they tried to be too many things and never really nailed down which:

  • A persistent URL
  • Location independent
  • A resolution system
  • A pure identifier

Persistence and location independence came to be thought of more as social than technical problems. Other approaches were developed rather than formalizing the URN concept.

The proposed URN resolution system never was fully developed. And resolution is incompatible with the role of pure identifier.

More about why URNs are not suitable as pure identifiers

Top of Page Top of Page
  URI Resource Pages Home >> About URIs
  The Library of Congress >> Standards
  December 4, 2007
Contact Us