Skip Navigation Links  The Library of Congress >> Standards
IFLA/CDNL Alliance for Bibliographic Standards - Library of Congress
  Home URI Resource Pages

URI Resource Pages

XML Namespace URIs (and schema location URIs, and schema identifiers)

This note addresses the question "what form should an XML namespace URI take?" and compares and contrasts XML namespace URIs with schema location URIs, and schema identifiers.

XML Namespace URI: What Scheme?

Should an XML namespace URI be an ‘http:’ URL, or should it be based on another URI scheme: ‘urn:’ or ‘info:’?

For example, the MODS version 3 namespace identifier is:

http://www.loc.gov/mods/v3

However, the MODS version 4 namespace identifier might take the form:

info:xmlns/mods-4.0

As another example of this alternative form - OASIS is developing a schema for codelist representation.  Hypothetically it could use the namespace identifier:

info:xmlns/oasis-codelist

There are arguments for either form: (1) 'http:', and (2) either 'info:' or 'urn:'. This note advocates the second form. This is not a technical argument,  rather, it suggests that confusion caused by using 'http:' URIs for namespaces outweighs the gains; that the world of XML is simpler when namespace URIs use a scheme different from http. (And this is not an argument for a rule prohibiting 'http:' URIs as namespaces, it simply advocates not using them.)

Identifiers and Locators

In the library community we consider a namespace URI to be a pure identifier, one whose purpose is to enable two parties to agree that they are talking about the same thing. It is an identifier rather than a locator, and while some experts believe this is an artificial distinction, in the library world we believe it is a very real distinction. To illustrate this point we contrast a namespace URI with a schema location URL.  

  • First, the schema location URL.
    An XML document typically includes a schemaLocation attribute (at the root element) which includes a URL for a location where the schema for that XML document can be found. It is an 'http:' URL, appropriately so: it is provided so that the XML processor may (if it so chooses) retrieve the schema for validation. This URL is not unique; the schema could be located in many places.

    This latter point is critically important: 
    the schema-location URL is not a unique identifier for the schema.

  • A namespace URI is completely different.
    An XML processor does not dereference the namespace URI. It might look it up in an internal table to see if the XML namespace it identifies is among the namespaces it knows about. That is its sole purpose. Whether there really is a document located by the namespace URI is irrelevant. That document doesn’t conform to any prescribed, machine-ingestible format, because there is no such thing. And there probably never will be. So even if the XML processor were to retrieve this document, there isn’t anything it could do with it.

Confusion and its Repercussions

XML namespaces are important – more important to some communities than others, but very important within the library community. And namespaces are an abstract concept, difficult enough to grasp without further obfuscation. Those who are not well schooled in the subtleties and abstractions of XML have difficulty grasping the meaning and importance of XML namespaces, and this includes the majority of people who deal with XML. When so many namespace URIs look much like schema-location URLs, many XML users simply write off one or the other as extraneous and irrelevant. The result is that much of the world either misunderstands or distorts the difference between a schema and a namespace. This has had some fairly nasty repercussions:

  • Software developers simply ignoring namespaces altogether, or refusing to support mixing of namespaces within an XML document.
  • Schema developers refusing to mix namespaces, resulting in identical definitions duplicated over multiple schemas.
  • People creating XML content becoming thoroughly confused about schemas and namespaces.

Historical Debate

It is fairly well-accepted history that 'http:' URIs became fashionable as identifiers for abstract objects (where the primary function is identification rather than location), out of convenience, to utilize the well-established DNS and its ability to provide unambiguous names. There was some resistance to this practice, from people who felt that ‘http:’ implied a protocol. Supporters advanced these counter-arguments:

  1. A careful reading of the HTTP protocol reveals that pure identification is a legitimate use (like it or not).
  2. A URI should always resolve to something, even if only a human-readable description of the abstract entity it identifies.

Counterarguments, respectively:

  1. The ‘http:’ protocol is defined by a complex document that most people will never read carefully, and it is frustrating, and frankly condescending, to be told by protocol experts in effect, “you just misunderstand”.  That’s the point, it’s too complicated! Many very bright people, well schooled in XML, are confused!

  2. The use of a single URI for these dual purposes -- on one hand, identifying an abstract concept, and on the other hand retrieving a description of that concept -- has probably been the single largest source of confusion over the roles of URIs and URLs. A URI by definition identifies a resource. If a namespace is identified by a URI, and a description of that namespace is identified by a URI, then it follows (from that same definition) that the namespace and the definition are both resources. Clearly, they are not the same resource. It therefore follows that they need different URIs.

Web experts counter-argue: “yes, they are (or can be viewed as) the same resource. The same resource with different representations.”  Thus we have a fundamental disagreement.

Fundamental Disagreement: an Item and its metadata - one resource or two?

In the library community, an item -- an article, for example -- and a metadata record describing the item, are two separate resources. They have different identifiers.  This is the fundamental point of departure from which there doesn’t seem to be a path to conciliation. 

There has been so much discussion and disagreement over the past decade about what “resource” means, and what it means to “dereference a resource”, that it may never be resolved.  Many library experts, though not educated in these esoteric nuances, are actually quite proficient with most aspect of XML. They have simply chosen not to spend inordinate time on these abstract, theoretical matters, instead taking a simple, straightforward view:

 If a resource is identified by a URI, and a description of that resource is identified by a URI, they must be different URIs.

Identifying vs.  Describing an Abstract Concept

Consider the URI:

info:ofi/fmt:kev:mtx:book

It identifies an abstract concept: “book”.  The URI is used within the OpenURL standard as a metadata element indicating the format, or material type, “book”, as distinguished from other material types such as: dissertation”, “patent”, and “journal”, which all have a corresponding identifier.  

This URI does not get dereferenced.   You can retrieve a description of it:

http://alcme.oclc.org/openurl/servlet/OAIHandler?
verb=GetRecord&metadataPrefix=oai_dc&identifier=info:ofi/fmt:kev:mtx:book

These are two different URIs. One is an identifier for the abstract concept. The other is a URL for the description of that concept.

'info:' xmlns Namespace

The Library of Congress has registered (*) an 'info:' namespace for LC indentifiers. (See Info URIs for Library of Congress Identifiers.) 

* pending approval.

Its syntax is

info:lc/<lc-info-subspace>/<subspace-specific-string>

One of the values for <lc-info-subspace>   is 'xmlns', thus for example the URI

info:lc/xmlns/rmd-v1

is the XML namespace URI for the SRU record metadata namespace.   

  

Another Dimension:  the Schema Identifier

We have compared and contrasted schema location URLs with XML namespace URIs. Here, we introduce a third object type: the schema identifier URI.

In an XML document the value of a schema location attribute is a space-separated pair of values: the XML namespace URI and the schema location.   This is a terribly confusing construct. The schema location URL is useful, but tying it to the namespace is confusing because (1) a schema may have more than one namespace; and (2) a namespace may be used by more than one schema.

If the XML-specification developers had considered the concept of a schema identifier, then certainly they would have used that instead of the XML namespace URI, that is, the schemaLocation attribute would have instead been defined as a space-separated pair: schema identifier, schema location. That would have avoided much confusion.

But the concept of a schema identifier is foreign to the XML world at large, although well known within the library community.

With the SRU protocol a client requests that a server send a record according to a particular schema - MODS, for instance. The client might say "please search the database for records with 'cat' in the title, and send five records, using the MODS schema". The request includes a variety of parameters, one of which identifies the MODS schema. That identifier is:

info:srw/schema/1/mods-v3.0

(See SRU 'info:' URIs and also.)

The MODS schema identifier never gets dereferenced. The server receiving it either recognizes it or not. If it recognizes it, it either supports it or not. So one of the following holds - the server:

  1. Recognizes and supports it -- it sends back MODS records.
  2. Recognizes it, doesn't support it. Sends a diagnostic "we don't send MODS records".
  3. Doesn’t recognize it: sends a diagnostic "schema id not recognized".

 In any case there is no scenario by which the server would dereference the URI, go get the schema, and then formulate and send MODS records. Most importantly, although (2) and (3) are failed transactions, most likely neither failed because of a misunderstanding (i.e. a misinterpreted URI). They failed because the server was unable to provide the requested service, not because it misunderstood the request.

Why not use the schema location to identify the schema? Because the schema may reside at several locations and if the client uses one, the server might recognize it by another, resulting in a failed transaction even though the server was capable of fulfilling the request. So the use of a schema identifier provides a higher level of interoperability (interoperability, not necessarily success) than a schema location.

There is a well-publicized table of these identifiers (see SRU Record Schemas), and for each there is a schema location URL supplied, so that a developer can retrieve the schema out-of-band. For example the unique URI info:srw/schema/1/mods-v3.2  identifies the schema, located in several places, one of which is http://www.loc.gov/standards/mods/v3/mods-3-2.xsd .  To identify the schema you use the first; to retrieve it, the second.

Conclusion and Summary

An XML namespace is an abstract resource. There may also be a resource that describes it. We’ve argued that the namespace and description are different resources and need different URIs.

We’ve described three types of URIs that are important to an XML schema:

  1. XML namespace URI
  2. Schema location URL
  3. Schema identifierURI

We’ve demonstrated how the schema location and schema identifier are related  -- that they are a locator and identifier, respectivel --) and that an XML namespace URI also falls into the identifier category.

We have argued that the confusion caused by using 'http:' URIs for XML namespaces outweighs the gains.  Treating an XML namespace as an abstract resource will reduce confusion and the XML world will be a kinder, gentler place.

 


These pages are maintained at the Library of Congress by the Network Development and MARC Standards Office, as part of its participation in the IFLA CDNL Alliance for Bibliographic Standards (ICABS), to provide information relevant to the library community about URIs, identifiers, locators, and related concepts. Comments, corrections, and suggestions are welcome; please email rden@loc.gov.

Top of Page Top of Page
  Home
  The Library of Congress >> Standards
  August 22, 2007
Contact Us