URI Resource Pages
XML Namespace URIs (and schema location URIs, and
schema identifiers)
This note addresses the question "what form should an XML namespace
URI take?" and compares and contrasts XML namespace URIs
with schema location URIs, and schema identifiers.
XML Namespace URI: What Scheme?
Should an XML namespace URI be an ‘http:’ URL, or
should it be based on another URI scheme: ‘urn:’ or ‘info:’?
For example, the MODS version
3 namespace identifier is:
http://www.loc.gov/mods/v3 |
However, the MODS version 4 namespace identifier might take the
form:
As another example of this alternative form - OASIS is developing
a schema for codelist representation. Hypothetically it could
use the namespace identifier:
info:xmlns/oasis-codelist |
There are arguments for either form: (1) 'http:', and (2) either
'info:' or 'urn:'. This note advocates the second form. This is
not a technical argument, rather, it suggests that confusion
caused by using 'http:' URIs for namespaces outweighs the gains;
that the world of XML is simpler when namespace URIs use a scheme
different from http. (And this is not an argument for a rule prohibiting
'http:' URIs as namespaces, it simply advocates not using them.)
Identifiers and Locators
In the library community we consider a namespace URI to be a pure identifier,
one whose purpose is to enable two parties to agree that they
are talking about the same thing. It is an identifier rather
than a locator, and while some experts believe this is
an artificial distinction, in the library world we believe it is
a very real distinction. To illustrate this point we contrast
a namespace URI with a schema location URL.
- First, the schema location URL.
An XML document typically includes a schemaLocation attribute
(at the root element) which includes a URL for a location where
the schema for that XML document can be found. It is an 'http:'
URL, appropriately so: it is provided so that the XML processor
may (if it so chooses) retrieve the schema for validation.
This URL is not unique; the schema could be located in many
places.
This latter point is critically important:
the schema-location URL is not a unique identifier for the
schema.
- A namespace URI is completely different.
An XML processor does not dereference the namespace URI. It might
look it up in an internal table to see if the XML namespace
it identifies is among the namespaces it knows about. That
is its sole purpose. Whether there really is a document located
by the namespace URI is irrelevant. That document doesn’t
conform to any prescribed, machine-ingestible format, because
there is no such thing. And there probably never will be. So
even if the XML processor were to retrieve this document, there
isn’t anything it could do with it.
Confusion and its Repercussions
XML namespaces are important – more important to some communities
than others, but very important within the library community. And
namespaces are an abstract concept, difficult enough to grasp without
further obfuscation. Those who are not well schooled in the subtleties
and abstractions of XML have difficulty grasping the meaning and
importance of XML namespaces, and this includes the majority of
people who deal with XML. When so many namespace URIs look much
like schema-location URLs, many XML users simply write off one
or the other as extraneous and irrelevant. The result is that much
of the world either misunderstands or distorts the difference between
a schema and a namespace. This has had some fairly nasty repercussions:
- Software developers simply ignoring namespaces altogether,
or refusing to support mixing of namespaces within an XML document.
- Schema developers refusing to mix namespaces, resulting in
identical definitions duplicated over multiple schemas.
- People creating XML content becoming thoroughly confused about
schemas and namespaces.
Historical Debate
It is fairly well-accepted history that 'http:' URIs became fashionable
as identifiers for abstract objects (where the primary function
is identification rather than location), out of convenience, to
utilize the well-established DNS and its ability to provide unambiguous
names. There was some resistance to this practice, from people
who felt that ‘http:’ implied a protocol. Supporters
advanced these counter-arguments:
- A careful reading of the HTTP protocol reveals that pure identification
is a legitimate use (like it or not).
- A URI should always resolve to something, even if only a human-readable
description of the abstract entity it identifies.
Counterarguments, respectively:
- The ‘http:’ protocol is defined by a complex document
that most people will never read carefully, and it is frustrating,
and frankly condescending, to be told by protocol experts in
effect, “you just misunderstand”. That’s
the point, it’s too complicated! Many very bright people,
well schooled in XML, are confused!
- The use of a single URI for these dual purposes -- on one hand,
identifying an abstract concept, and on the other hand retrieving
a description of that concept -- has probably been the single
largest source of confusion over the roles of URIs and URLs.
A URI by definition identifies a resource. If a namespace is
identified by a URI, and a description of that namespace is identified
by a URI, then it follows (from that same definition) that the
namespace and the definition are both resources. Clearly, they
are not the same resource. It therefore follows that they need
different URIs.
Web experts counter-argue: “yes, they are (or can be viewed
as) the same resource. The same resource with different representations.” Thus
we have a fundamental disagreement.
Fundamental Disagreement: an Item and its metadata - one resource
or two?
In the library community, an item -- an article, for example --
and a metadata record describing the item, are two separate resources.
They have different identifiers. This is the fundamental
point of departure from which there doesn’t seem to be a
path to conciliation.
There has been so much discussion and disagreement over the past
decade about what “resource” means, and what it means
to “dereference a resource”, that it may never be resolved. Many
library experts, though not educated in these esoteric nuances,
are actually quite proficient with most aspect of XML. They have
simply chosen not to spend inordinate time on these abstract, theoretical
matters, instead taking a simple, straightforward view:
If a resource is identified by a URI, and a description
of that resource is identified by a URI, they must be different
URIs.
Identifying vs. Describing an Abstract Concept
Consider the URI:
info:ofi/fmt:kev:mtx:book |
It identifies an abstract concept: “book”. The
URI is used within the OpenURL standard as a metadata element indicating
the format, or material type, “book”, as distinguished
from other material types such as: dissertation”, “patent”,
and “journal”, which all have a corresponding identifier.
This URI does not get dereferenced. You can retrieve
a description of it:
These are two different URIs. One is an identifier for the abstract
concept. The other is a URL for the description of that concept.
'info:' xmlns Namespace
The Library of Congress has registered (*) an 'info:' namespace
for LC indentifiers. (See Info
URIs for Library of Congress Identifiers.)
* pending approval.
Its syntax
is
info:lc/<lc-info-subspace>/<subspace-specific-string>
One of the values for <lc-info-subspace>
is 'xmlns', thus for example the URI
info:lc/xmlns/rmd-v1
is the XML namespace URI for the SRU record metadata namespace.
Another Dimension: the Schema Identifier
We have compared and contrasted schema location URLs with XML
namespace URIs. Here, we introduce a third object type: the schema
identifier URI.
In an XML document the value of a schema location attribute is
a space-separated pair of values: the XML namespace URI and the
schema location. This is a terribly confusing construct.
The schema location URL is useful, but tying it to the namespace
is confusing because (1) a schema may have more than one namespace;
and (2) a namespace may be used by more than one schema.
If the XML-specification developers had considered the concept
of a schema identifier, then certainly they would have used that
instead of the XML namespace URI, that is, the schemaLocation attribute
would have instead been defined as a space-separated pair: schema
identifier, schema location. That would have avoided much confusion.
But the concept of a schema identifier is foreign to the XML world
at large, although well known within the library community.
With the SRU protocol a
client requests that a server send a record according to a particular
schema - MODS,
for instance. The client might say "please search the database
for records with 'cat' in the title, and send five records, using
the MODS schema". The request includes a variety of parameters,
one of which identifies the MODS schema. That identifier is:
info:srw/schema/1/mods-v3.0 |
(See SRU 'info:'
URIs and also.)
The MODS schema identifier never gets dereferenced. The server
receiving it either recognizes it or not. If it recognizes it,
it either supports it or not. So one of the following holds - the
server:
- Recognizes and supports it -- it sends back MODS records.
- Recognizes it, doesn't support it. Sends a diagnostic "we
don't send MODS records".
- Doesn’t recognize it: sends a diagnostic "schema
id not recognized".
In any case there is no scenario by which the server would
dereference the URI, go get the schema, and then formulate and
send MODS records. Most importantly, although (2) and (3) are failed
transactions, most likely neither failed because of a misunderstanding
(i.e. a misinterpreted URI). They failed because the server was
unable to provide the requested service, not because it misunderstood
the request.
Why not use the schema location to identify the schema? Because
the schema may reside at several locations and if the client uses
one, the server might recognize it by another, resulting in a failed
transaction even though the server was capable of fulfilling the
request. So the use of a schema identifier provides a higher level
of interoperability (interoperability, not necessarily success)
than a schema location.
There is a well-publicized table of these identifiers (see SRU
Record Schemas), and for each there is a schema location
URL supplied, so that a developer can retrieve the schema out-of-band.
For example the unique URI info:srw/schema/1/mods-v3.2 identifies
the schema, located in several places, one of which is http://www.loc.gov/standards/mods/v3/mods-3-2.xsd . To identify the
schema you use the first; to retrieve it, the second.
Conclusion and Summary
An XML namespace is an abstract resource. There may also be a
resource that describes it. We’ve argued that the namespace
and description are different resources and need different URIs.
We’ve described three types of URIs that are important to
an XML schema:
- XML namespace URI
- Schema location URL
- Schema identifierURI
We’ve demonstrated how the schema location and schema identifier
are related -- that they are a locator and identifier, respectivel
--) and that an XML namespace
URI also falls into the identifier category.
We have argued that the confusion caused by using 'http:' URIs
for XML namespaces outweighs the gains. Treating
an XML namespace as an abstract resource will reduce confusion
and the XML world will be a kinder, gentler place.
These pages are maintained at the Library of Congress by the
Network Development and MARC Standards Office, as part of its
participation in the IFLA CDNL Alliance for Bibliographic Standards
(ICABS), to provide information relevant to the library community
about URIs, identifiers, locators, and related concepts. Comments,
corrections, and suggestions are welcome; please email rden@loc.gov.
|